Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

[PDF][PDF] Sentence level dialect identification in Arabic

H Elfardy, M Diab - Proceedings of the 51st Annual Meeting of the …, 2013 - aclanthology.org
This paper introduces a supervised approach for performing sentence level dialect
identification between Modern Standard Arabic and Egyptian Dialectal Arabic. We use token …

Estimating code-switching on twitter with a novel generalized word-level language detection technique

S Rijhwani, R Sequiera, M Choudhury… - Proceedings of the …, 2017 - aclanthology.org
Word-level language detection is necessary for analyzing code-switched text, where
multiple languages could be mixed within a sentence. Existing models are restricted to code …

Transformer based language identification for malayalam-english code-mixed text

S Thara, P Poornachandran - IEEE Access, 2021 - ieeexplore.ieee.org
Social media users have the proclivity to write majority of the data for under resourced
languages in code-mixed format. Code-mixing is defined as mixing of two or more …

[PDF][PDF] A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic.

R Cotterell, C Callison-Burch - LREC, 2014 - lrec-conf.org
This paper presents a multi-dialect, multi-genre, human annotated corpus of dialectal Arabic.
We collected utterances in five Arabic dialects: Levantine, Gulf, Egyptian, Iraqi and …

[PDF][PDF] Multilingual code-switching identification via lstm recurrent neural networks

Y Samih, S Maharjan, M Attia… - Proceedings of the …, 2016 - aclanthology.org
This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second
Workshop on Computational Approaches to Code Switching. Our system ranked first place …

[PDF][PDF] Transliteration of arabizi into arabic orthography: Develo** a parallel annotated arabizi-arabic script sms/chat corpus

A Bies, Z Song, M Maamouri, S Grimes… - Proceedings of the …, 2014 - aclanthology.org
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic
script corpus of SMS/Chat data. The language used in social media expresses many …

[PDF][PDF] The NRC system for discriminating similar languages

C Goutte, S Léger, M Carpuat - … of the first workshop on applying …, 2014 - aclanthology.org
We describe the system built by the National Research Council Canada for the”
Discriminating between similar languages”(DSL) shared task. Our system uses various …

[PDF][PDF] Foreign words and the automatic processing of Arabic social media text written in Roman script

R Eskander, M Al-Badrashiny, N Habash… - Proceedings of The …, 2014 - aclanthology.org
Arabic on social media has all the properties of any language on social media that make it
tough for natural language processing, plus some specific problems. These include …

[PDF][PDF] Aida: Identifying code switching in informal arabic text

H Elfardy, M Al-Badrashiny, M Diab - Proceedings of The First …, 2014 - aclanthology.org
In this paper, we present the latest version of our system for identifying linguistic code
switching in Arabic text. The system relies on Language Models and a tool for morphological …