Natural language processing for dialects of a language: A survey

A Joshi, R Dabre, D Kanojia, Z Li, H Zhan… - ACM Computing …, 2024 - dl.acm.org
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …

[PDF][PDF] Machine learning for ancient languages: A survey

T Sommerschield, Y Assael, J Pavlopoulos… - Computational …, 2023 - direct.mit.edu
Ancient languages preserve the cultures and histories of the past. However, their study is
fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from …

Findings of the VarDial evaluation campaign 2023

N Aepli, Ç Çöltekin, R Van Der Goot… - arxiv preprint arxiv …, 2023 - arxiv.org
This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …

[PDF][PDF] Findings of the VarDial evaluation campaign 2021

BR Chakravarthi, M Găman, RT Ionescu, H Jauhiainen… - EACL| VarDial, 2021 - orbilu.uni.lu
This paper describes the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …

A report on the VarDial evaluation campaign 2020

M Gaman, D Hovy, RT Ionescu… - Proceedings of the …, 2020 - aclanthology.org
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part
of the seventh workshop on Natural Language Processing (NLP) for Similar Languages …

Natural language processing for similar languages, varieties, and dialects: A survey

M Zampieri, P Nakov, Y Scherrer - Natural Language Engineering, 2020 - cambridge.org
There has been a lot of recent interest in the natural language processing (NLP) community
in the computational processing of language varieties and dialects, with the aim to improve …

Language variety identification with true labels

M Zampieri, K North, T Jauhiainen, M Felice… - arxiv preprint arxiv …, 2023 - arxiv.org
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …

Comparing approaches to Dravidian language identification

T Jauhiainen, T Ranasinghe, M Zampieri - arxiv preprint arxiv:2103.05552, 2021 - arxiv.org
This paper describes the submissions by team HWR to the Dravidian Language
Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set …

Classifying cuneiform symbols using machine learning algorithms with unigram features on a balanced dataset

M Mahmood, FM Jasem, AA Mukhlif… - Journal of Intelligent …, 2023 - degruyter.com
Problem Recognizing written languages using symbols written in cuneiform is a tough
endeavor due to the lack of information and the challenge of the process of tokenization. The …

FreCDo: A large corpus for french cross-domain dialect identification

M Găman, AG Chifu, W Domingues… - Procedia Computer …, 2023 - Elsevier
We present a novel corpus for French dialect identification comprising 413,522 French text
samples collected from public news websites in Belgium, Canada, France and Switzerland …