A systematic review on language identification of code-mixed text: techniques, data availability, challenges, and framework development

AF Hidayatullah, A Qazi, DTC Lai, RA Apong - IEEE access, 2022 - ieeexplore.ieee.org
The mix of native language with other languages (code-mixing) in social media has posed a
severe challenge for language identification (LID) systems. It has encouraged research on …

Overview of coli-kanglish: Word level language identification in code-mixed kannada-english texts at icon 2022

F Balouchzahi, S Butt, A Hegde, N Ashraf… - Proceedings of the …, 2022 - aclanthology.org
Abstract The task of Language Identification (LI) in text processing refers to automatically
identifying the languages used in a text document. LI task is usually been studied at the …

Code mixed question answering challenge using deep learning methods

S Thara, E Sampath, P Reddy - 2020 5th international …, 2020 - ieeexplore.ieee.org
In social media, code-mixed language questions (combination of two distinct languages) is
turning into the favored method of expression and communication. In twitter people can go …

[PDF][PDF] Overview of CoLI-Tunglish: Word-level Language Identification in Code-mixed Tulu Text at FIRE 2023.

A Hegde, F Balouchzahi, S Coelho… - FIRE (Working …, 2023 - researchgate.net
Abstract Word-level Language Identification (LI) aims to identify the language of individual
words within a given sentence. It is a preliminary step in processing code-mixed text in …

Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings

UR Rayala, K Seshadri, NB Sristy - ACM Transactions on Asian and …, 2023 - dl.acm.org
Learning the inherent meaning of a word in Natural Language Processing (NLP) has
motivated researchers to represent a word at various levels of abstraction, namely character …

BharatBhasaNet-A unified framework to identify Indian code mix Languages

S Dey, S Thakur, A Kandwal, R Kumar… - IEEE …, 2024 - ieeexplore.ieee.org
In the rapidly globalizing digital communication sphere, the imperative for advanced
multilingual text recognition and identification is increasingly evident. Contrasting the …

Indo-Aryan Dialect Identification Using Deep Learning Ensemble Model

PM Subhash, CR Kavitha, D Gupta - Procedia Computer Science, 2024 - Elsevier
Abstract Language identification has become a critical challenge in NLP, particularly in
multilingual countries like India. This study addresses the identification of closely related …

Finding the duplicate questions in stack overflow using word embeddings

J Babu, S Thara - Procedia Computer Science, 2020 - Elsevier
Searching query in the web applications may or may not yield anticipated results,
constrained by the questions asked. User may not feel comfortable by seeing a bunch of …

Transformer Based Sentiment Analysis on Code Mixed Data

KK Sampath, M Supriya - Procedia Computer Science, 2024 - Elsevier
In India, a country known for its linguistic diversity, code mixing is a common practice, and it
has a profound impact on the way people communicate through various mediums, including …

On Importance of Code-Mixed Embeddings for Hate Speech Identification

S Jagdale, O Khade, G Takalikar, M Inamdar… - arxiv preprint arxiv …, 2024 - arxiv.org
Code-mixing is the practice of using two or more languages in a single sentence, which
often occurs in multilingual communities such as India where people commonly speak …