Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Tweetlid: a benchmark for tweet language identification

A Zubiaga, IS Vicente, P Gamallo, JR Pichel… - Language Resources …, 2016 - Springer
Abstract Language identification, as the task of determining the language a given text is
written in, has progressed substantially in recent decades. However, three main issues …

Characterising text mining: a systematic map** review of the portuguese language

E Souza, D Costa, DW Castro, D Vitório, I Teles… - IET …, 2018 - Wiley Online Library
Documents written in natural language constitute a major part of the artefacts produced
during the software engineering life cycle. Studies indicate that more than 80% of enterprise …

Arabic dialect identification in the context of bivalency and code-switching

M El-Haj, P Rayson, M Aboelezz - Proceedings of the 11th …, 2018 - eprints.bbk.ac.uk
In this paper we use a novel approach towards Arabic dialect identification using language
bivalency and written code-switching. Bivalency between languages or dialects is where a …

[PDF][PDF] Overview of TweetLID: Tweet Language Identification at SEPLN 2014.

A Zubiaga, I San Vicente, P Gamallo, JRP Campos… - TweetLID …, 2014 - orai.eus
Overview of TweetLID: Tweet Language Identification at SEPLN 2014 Page 1 Overview of
TweetLID: Tweet Language Identification at SEPLN 2014 Introducción a TweetLID: Tarea …

Smoothed n-gram based models for tweet language identification: A case study of the brazilian and european portuguese national varieties

DW Castro, E Souza, D Vitório, D Santos… - Applied Soft …, 2017 - Elsevier
Identifying the language of a text is an important step for several natural language
processing applications. State-of-the-art language identification (LID) systems perform very …

Mining multilingual and multiscript Twitter data: unleashing the language and script barrier

B Sarkar, N Sinhababu, M Roy… - … and Data Mining, 2020 - inderscienceonline.com
Micro-blogging sites like Twitter have become an opinion hub where views on diverse topics
are expressed. Interpreting, comprehending and analysing this emotion-rich information can …

Discriminating between Brazilian and European Portuguese national varieties on Twitter texts

D Castro, E Souza… - 2016 5th Brazilian …, 2016 - ieeexplore.ieee.org
Twitter is one of the most used social media with users generating about 1 million messages
per day. As a result of the expansion of this microblog, there is a diversity of languages used …

Factorized Recurrent Neural Network with Attention for Language Identification and Content Detection

BH Belay, GB Gebremeskel, BB Bezabih… - ACM Transactions on …, 2023 - dl.acm.org
Language identification and content detection are essential for ensuring effective digital
communication, and content moderation. While extensive research has primarily focused on …

Effective language identification of forum texts based on statistical approaches

K Abainia, S Ouamour, H Sayoud - Information Processing & Management, 2016 - Elsevier
This investigation deals with the problem of language identification of noisy texts, which
could represent the primary step of many natural language processing or information …