Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Discriminating between similar languages and arabic dialect identification: A report on the third dsl shared task

S Malmasi, M Zampieri, N Ljubešić… - Proceedings of the …, 2016 - aclanthology.org
We present the results of the third edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the VarDial'2016 workshop at …

Findings of the VarDial evaluation campaign 2017

M Zampieri, S Malmasi, N Ljubešić… - Proceedings of the …, 2017 - aclanthology.org
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …

[PDF][PDF] Findings of the VarDial evaluation campaign 2021

BR Chakravarthi, M Găman, RT Ionescu, H Jauhiainen… - EACL| VarDial, 2021 - orbilu.uni.lu
This paper describes the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …

[PDF][PDF] A report on the DSL shared task 2014

M Zampieri, L Tan, N Ljubešić… - Proceedings of the first …, 2014 - aclanthology.org
This paper summarizes the methods, results and findings of the Discriminating between
Similar Languages (DSL) shared task 2014. The shared task provided data from 13 different …

Language variety identification with true labels

M Zampieri, K North, T Jauhiainen, M Felice… - arxiv preprint arxiv …, 2023 - arxiv.org
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …

[PDF][PDF] Overview of the DSL shared task 2015

M Zampieri, L Tan, N Ljubešić… - Proceedings of the …, 2015 - aclanthology.org
We present the results of the 2nd edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the LT4VarDial'2015 workshop and …

Arabic dialect identification in speech transcripts

S Malmasi, M Zampieri - Proceedings of the Third Workshop on …, 2016 - aclanthology.org
In this paper we describe a system developed to identify a set of four regional Arabic dialects
(Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a …

Discriminating similar languages: Evaluations and explorations

C Goutte, S Léger, S Malmasi, M Zampieri - arxiv preprint arxiv …, 2016 - arxiv.org
We present an analysis of the performance of machine learning classifiers on discriminating
between similar languages and language varieties. We carried out a number of experiments …

[PDF][PDF] Exploring lexical and syntactic features for language variety identification

C van der Lee, A van den Bosch - … of the fourth workshop on NLP for …, 2017 - pure.knaw.nl
We present a method to discriminate between texts written in either the Netherlandic or the
Flemish variant of the Dutch language. The method draws on a feature bundle representing …