A systematic review on language identification of code-mixed text: techniques, data availability, challenges, and framework development
The mix of native language with other languages (code-mixing) in social media has posed a
severe challenge for language identification (LID) systems. It has encouraged research on …
severe challenge for language identification (LID) systems. It has encouraged research on …
Code-mixing: A brief survey
S Thara, P Poornachandran - 2018 International conference on …, 2018 - ieeexplore.ieee.org
Indians and many other non-English speakers across the world, prefer not to use single
code in their messaging texts on social media platforms. They make use of transliteration …
code in their messaging texts on social media platforms. They make use of transliteration …
Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
A survey of code-switched speech and language processing
Code-switching, the alternation of languages within a conversation or utterance, is a
common communicative phenomenon that occurs in multilingual communities across the …
common communicative phenomenon that occurs in multilingual communities across the …
Overview for the second shared task on language identification in code-switched data
We present an overview of the second shared task on language identification in code-
switched data. For the shared task, we had code-switched data from two different language …
switched data. For the shared task, we had code-switched data from two different language …
Transformer based language identification for malayalam-english code-mixed text
S Thara, P Poornachandran - IEEE Access, 2021 - ieeexplore.ieee.org
Social media users have the proclivity to write majority of the data for under resourced
languages in code-mixed format. Code-mixing is defined as mixing of two or more …
languages in code-mixed format. Code-mixing is defined as mixing of two or more …
Language identification and named entity recognition in hinglish code mixed tweets
While growing code-mixed content on Online Social Networks (OSN) provides a fertile
ground for studying various aspects of code-mixing, the lack of automated text analysis tools …
ground for studying various aspects of code-mixing, the lack of automated text analysis tools …
[PDF][PDF] Curriculum design for code-switching: Experiments with language identification and language modeling with deep neural networks
Curriculum learning strategies are known to improve the accuracy, robustness and
convergence rate for various language learning tasks using deep architectures (Bengio et …
convergence rate for various language learning tasks using deep architectures (Bengio et …
Hierarchical character-word models for language identification
Social media messages' brevity and unconventional spelling pose a challenge to language
identification. We introduce a hierarchical model that learns character and contextualized …
identification. We introduce a hierarchical model that learns character and contextualized …
An annotated corpus of emerging anglicisms in Spanish newspaper headlines
EÁ Mellado - Proceedings of the 4th Workshop on Computational …, 2020 - aclanthology.org
The extraction of anglicisms (lexical borrowings from English) is relevant both for
lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European …
lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European …