Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
[PDF][PDF] Sentence level dialect identification in Arabic
This paper introduces a supervised approach for performing sentence level dialect
identification between Modern Standard Arabic and Egyptian Dialectal Arabic. We use token …
identification between Modern Standard Arabic and Egyptian Dialectal Arabic. We use token …
Estimating code-switching on twitter with a novel generalized word-level language detection technique
Word-level language detection is necessary for analyzing code-switched text, where
multiple languages could be mixed within a sentence. Existing models are restricted to code …
multiple languages could be mixed within a sentence. Existing models are restricted to code …
Transformer based language identification for malayalam-english code-mixed text
S Thara, P Poornachandran - IEEE Access, 2021 - ieeexplore.ieee.org
Social media users have the proclivity to write majority of the data for under resourced
languages in code-mixed format. Code-mixing is defined as mixing of two or more …
languages in code-mixed format. Code-mixing is defined as mixing of two or more …
[PDF][PDF] A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic.
This paper presents a multi-dialect, multi-genre, human annotated corpus of dialectal Arabic.
We collected utterances in five Arabic dialects: Levantine, Gulf, Egyptian, Iraqi and …
We collected utterances in five Arabic dialects: Levantine, Gulf, Egyptian, Iraqi and …
[PDF][PDF] Multilingual code-switching identification via lstm recurrent neural networks
This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second
Workshop on Computational Approaches to Code Switching. Our system ranked first place …
Workshop on Computational Approaches to Code Switching. Our system ranked first place …
[PDF][PDF] Transliteration of arabizi into arabic orthography: Develo** a parallel annotated arabizi-arabic script sms/chat corpus
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic
script corpus of SMS/Chat data. The language used in social media expresses many …
script corpus of SMS/Chat data. The language used in social media expresses many …
[PDF][PDF] The NRC system for discriminating similar languages
We describe the system built by the National Research Council Canada for the”
Discriminating between similar languages”(DSL) shared task. Our system uses various …
Discriminating between similar languages”(DSL) shared task. Our system uses various …
[PDF][PDF] Foreign words and the automatic processing of Arabic social media text written in Roman script
Arabic on social media has all the properties of any language on social media that make it
tough for natural language processing, plus some specific problems. These include …
tough for natural language processing, plus some specific problems. These include …
[PDF][PDF] Aida: Identifying code switching in informal arabic text
In this paper, we present the latest version of our system for identifying linguistic code
switching in Arabic text. The system relies on Language Models and a tool for morphological …
switching in Arabic text. The system relies on Language Models and a tool for morphological …