A systematic review on language identification of code-mixed text: techniques, data availability, challenges, and framework development

AF Hidayatullah, A Qazi, DTC Lai, RA Apong - IEEE access, 2022‏ - ieeexplore.ieee.org
The mix of native language with other languages (code-mixing) in social media has posed a
severe challenge for language identification (LID) systems. It has encouraged research on …

Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets

AF Hidayatullah, RA Apong, DTC Lai, A Qazi - PeerJ Computer Science, 2023‏ - peerj.com
With the massive use of social media today, mixing between languages in social media text
is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing …

Code-switching input for machine translation: a case study of Vietnamese–English data

L Nguyen, O Mayeux, Z Yuan - International Journal of …, 2024‏ - Taylor & Francis
Multilingualism presents both a challenge and an opportunity for Natural Language
Processing, with code-switching representing a particularly interesting problem for …

Grammatical error correction for code-switched sentences by learners of English

KWH Chan, C Bryant, L Nguyen, A Caines… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Code-switching (CSW) is a common phenomenon among multilingual speakers where
multiple languages are used in a single discourse or utterance. Mixed language utterances …

How effective is machine translation on low-resource code-switching? A case study comparing human and automatic metrics

L Nguyen, C Bryant, O Mayeux… - Findings of the …, 2023‏ - aclanthology.org
This paper presents an investigation into the differences between processing monolingual
input and code-switching (CSW) input in the context of machine translation (MT) …

How do LGBTQ+ library catalog users talk about subject searching?

H Moulaison-Sandy, B Dobreski, K Snow - Journal of Documentation, 2025‏ - emerald.com
Purpose Subject searching in the library catalog is a challenge for any user, but may be
especially so for members of marginalized groups whose language diverges even further …

[PDF][PDF] TongueSwitcher: Fine-Grained Identification of German-English Code-Switching

I Sterner, S Teufel - 2023‏ - repository.cam.ac.uk
This paper contributes to German–English code-switching research. We provide the largest
corpus of naturally occurring German–English code-switching, where English is included in …

Multilingual Identification of English Code-Switching

I Sterner - Proceedings of the Eleventh Workshop on NLP for …, 2024‏ - aclanthology.org
This work addresses the task of identifying English code-switching in multilingual text. We
train two token-level classifiers on data of high-resource language pairs. The first …

Large Scale, Multi-domain Language Identification

T Jauhiainen, M Zampieri, T Baldwin… - … Language Identification in …, 2024‏ - Springer
In general, the more recognizable languages there are, the more difficult it is to recognize
the language (Brown; Rodrigues; Jauhiainen et al.). It is intuitively easy to understand that if …

A Comprehensive Survey of Techniques Used for Part-of-Speech Tagging of Code-Mixed Social Media Text

S Sunita, A Kumar, N Neetika - 2023‏ - researchsquare.com
Part-of-speech tagging faces unique difficulties when dealing with code-mixed social media
text, which combines multiple languages in informal content created by users. In India, many …