Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
Tweetlid: a benchmark for tweet language identification
Abstract Language identification, as the task of determining the language a given text is
written in, has progressed substantially in recent decades. However, three main issues …
written in, has progressed substantially in recent decades. However, three main issues …
Characterising text mining: a systematic map** review of the portuguese language
Documents written in natural language constitute a major part of the artefacts produced
during the software engineering life cycle. Studies indicate that more than 80% of enterprise …
during the software engineering life cycle. Studies indicate that more than 80% of enterprise …
Arabic dialect identification in the context of bivalency and code-switching
In this paper we use a novel approach towards Arabic dialect identification using language
bivalency and written code-switching. Bivalency between languages or dialects is where a …
bivalency and written code-switching. Bivalency between languages or dialects is where a …
[PDF][PDF] Overview of TweetLID: Tweet Language Identification at SEPLN 2014.
Overview of TweetLID: Tweet Language Identification at SEPLN 2014 Page 1 Overview of
TweetLID: Tweet Language Identification at SEPLN 2014 Introducción a TweetLID: Tarea …
TweetLID: Tweet Language Identification at SEPLN 2014 Introducción a TweetLID: Tarea …
Smoothed n-gram based models for tweet language identification: A case study of the brazilian and european portuguese national varieties
Identifying the language of a text is an important step for several natural language
processing applications. State-of-the-art language identification (LID) systems perform very …
processing applications. State-of-the-art language identification (LID) systems perform very …
Mining multilingual and multiscript Twitter data: unleashing the language and script barrier
B Sarkar, N Sinhababu, M Roy… - … and Data Mining, 2020 - inderscienceonline.com
Micro-blogging sites like Twitter have become an opinion hub where views on diverse topics
are expressed. Interpreting, comprehending and analysing this emotion-rich information can …
are expressed. Interpreting, comprehending and analysing this emotion-rich information can …
Discriminating between Brazilian and European Portuguese national varieties on Twitter texts
Twitter is one of the most used social media with users generating about 1 million messages
per day. As a result of the expansion of this microblog, there is a diversity of languages used …
per day. As a result of the expansion of this microblog, there is a diversity of languages used …
Factorized Recurrent Neural Network with Attention for Language Identification and Content Detection
BH Belay, GB Gebremeskel, BB Bezabih… - ACM Transactions on …, 2023 - dl.acm.org
Language identification and content detection are essential for ensuring effective digital
communication, and content moderation. While extensive research has primarily focused on …
communication, and content moderation. While extensive research has primarily focused on …
Effective language identification of forum texts based on statistical approaches
This investigation deals with the problem of language identification of noisy texts, which
could represent the primary step of many natural language processing or information …
could represent the primary step of many natural language processing or information …