Natural language processing for dialects of a language: A survey
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …
corpora, and report a superlative performance on evaluation datasets. This survey delves …
[PDF][PDF] Machine learning for ancient languages: A survey
Ancient languages preserve the cultures and histories of the past. However, their study is
fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from …
fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from …
Findings of the VarDial evaluation campaign 2023
This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …
[PDF][PDF] Findings of the VarDial evaluation campaign 2021
This paper describes the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
A report on the VarDial evaluation campaign 2020
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part
of the seventh workshop on Natural Language Processing (NLP) for Similar Languages …
of the seventh workshop on Natural Language Processing (NLP) for Similar Languages …
Natural language processing for similar languages, varieties, and dialects: A survey
There has been a lot of recent interest in the natural language processing (NLP) community
in the computational processing of language varieties and dialects, with the aim to improve …
in the computational processing of language varieties and dialects, with the aim to improve …
Language variety identification with true labels
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …
publicly available language identification datasets, however, are compiled under the …
Comparing approaches to Dravidian language identification
This paper describes the submissions by team HWR to the Dravidian Language
Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set …
Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set …
Classifying cuneiform symbols using machine learning algorithms with unigram features on a balanced dataset
Problem Recognizing written languages using symbols written in cuneiform is a tough
endeavor due to the lack of information and the challenge of the process of tokenization. The …
endeavor due to the lack of information and the challenge of the process of tokenization. The …
FreCDo: A large corpus for french cross-domain dialect identification
We present a novel corpus for French dialect identification comprising 413,522 French text
samples collected from public news websites in Belgium, Canada, France and Switzerland …
samples collected from public news websites in Belgium, Canada, France and Switzerland …