Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
Discriminating between similar languages and arabic dialect identification: A report on the third dsl shared task
We present the results of the third edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the VarDial'2016 workshop at …
(DSL) shared task, which was organized as part of the VarDial'2016 workshop at …
Findings of the VarDial evaluation campaign 2017
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …
[PDF][PDF] Findings of the VarDial evaluation campaign 2021
This paper describes the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
[PDF][PDF] A report on the DSL shared task 2014
This paper summarizes the methods, results and findings of the Discriminating between
Similar Languages (DSL) shared task 2014. The shared task provided data from 13 different …
Similar Languages (DSL) shared task 2014. The shared task provided data from 13 different …
Language variety identification with true labels
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …
publicly available language identification datasets, however, are compiled under the …
[PDF][PDF] Overview of the DSL shared task 2015
We present the results of the 2nd edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the LT4VarDial'2015 workshop and …
(DSL) shared task, which was organized as part of the LT4VarDial'2015 workshop and …
Arabic dialect identification in speech transcripts
In this paper we describe a system developed to identify a set of four regional Arabic dialects
(Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a …
(Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a …
Discriminating similar languages: Evaluations and explorations
We present an analysis of the performance of machine learning classifiers on discriminating
between similar languages and language varieties. We carried out a number of experiments …
between similar languages and language varieties. We carried out a number of experiments …
[PDF][PDF] Exploring lexical and syntactic features for language variety identification
We present a method to discriminate between texts written in either the Netherlandic or the
Flemish variant of the Dutch language. The method draws on a feature bundle representing …
Flemish variant of the Dutch language. The method draws on a feature bundle representing …