Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

A survey of current datasets for code-switching research

N Jose, BR Chakravarthi… - 2020 6th …, 2020 - ieeexplore.ieee.org
Code switching is a prevalent phenomenon in the multilingual community and social media
interaction. In the past ten years, we have witnessed an explosion of code switched data in …

Computational sociolinguistics: A survey

D Nguyen, AS Doğruöz, CP Rosé… - Computational …, 2016 - direct.mit.edu
Abstract Language is a social phenomenon and variation is inherent to its social nature.
Recently, there has been a surge of interest within the computational linguistics (CL) …

A sentiment analysis dataset for code-mixed Malayalam-English

BR Chakravarthi, N Jose, S Suryawanshi… - arxiv preprint arxiv …, 2020 - arxiv.org
There is an increasing demand for sentiment analysis of text from social media which are
mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the …

GLUECoS: An evaluation benchmark for code-switched NLP

S Khanuja, S Dandapat, A Srinivasan… - arxiv preprint arxiv …, 2020 - arxiv.org
Code-switching is the use of more than one language in the same conversation or utterance.
Recently, multilingual contextual embedding models, trained on multiple monolingual …

Language modeling for code-mixing: The role of linguistic theory based synthetic data

A Pratapa, G Bhat, M Choudhury… - Proceedings of the …, 2018 - aclanthology.org
Training language models for Code-mixed (CM) language is known to be a difficult problem
because of lack of data compounded by the increased confusability due to the presence of …