A systematic review on language identification of code-mixed text: techniques, data availability, challenges, and framework development

AF Hidayatullah, A Qazi, DTC Lai, RA Apong - IEEE access, 2022 - ieeexplore.ieee.org
The mix of native language with other languages (code-mixing) in social media has posed a
severe challenge for language identification (LID) systems. It has encouraged research on …

Code-mixing: A brief survey

S Thara, P Poornachandran - 2018 International conference on …, 2018 - ieeexplore.ieee.org
Indians and many other non-English speakers across the world, prefer not to use single
code in their messaging texts on social media platforms. They make use of transliteration …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

A survey of code-switched speech and language processing

S Sitaram, KR Chandu, SK Rallabandi… - arxiv preprint arxiv …, 2019 - arxiv.org
Code-switching, the alternation of languages within a conversation or utterance, is a
common communicative phenomenon that occurs in multilingual communities across the …

Overview for the second shared task on language identification in code-switched data

G Molina, F AlGhamdi, M Ghoneim, A Hawwari… - arxiv preprint arxiv …, 2019 - arxiv.org
We present an overview of the second shared task on language identification in code-
switched data. For the shared task, we had code-switched data from two different language …

Transformer based language identification for malayalam-english code-mixed text

S Thara, P Poornachandran - IEEE Access, 2021 - ieeexplore.ieee.org
Social media users have the proclivity to write majority of the data for under resourced
languages in code-mixed format. Code-mixing is defined as mixing of two or more …

Language identification and named entity recognition in hinglish code mixed tweets

K Singh, I Sen, P Kumaraguru - Proceedings of ACL 2018 …, 2018 - aclanthology.org
While growing code-mixed content on Online Social Networks (OSN) provides a fertile
ground for studying various aspects of code-mixing, the lack of automated text analysis tools …

[PDF][PDF] Curriculum design for code-switching: Experiments with language identification and language modeling with deep neural networks

M Choudhury, K Bali, S Sitaram… - Proceedings of the 14th …, 2017 - aclanthology.org
Curriculum learning strategies are known to improve the accuracy, robustness and
convergence rate for various language learning tasks using deep architectures (Bengio et …

Hierarchical character-word models for language identification

A Jaech, G Mulcaire, S Hathi, M Ostendorf… - arxiv preprint arxiv …, 2016 - arxiv.org
Social media messages' brevity and unconventional spelling pose a challenge to language
identification. We introduce a hierarchical model that learns character and contextualized …

An annotated corpus of emerging anglicisms in Spanish newspaper headlines

EÁ Mellado - Proceedings of the 4th Workshop on Computational …, 2020 - aclanthology.org
The extraction of anglicisms (lexical borrowings from English) is relevant both for
lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European …