Multilingual offensive language identification for low-resource languages
Offensive content is pervasive in social media and a reason for concern to companies and
government organizations. Several studies have been recently published investigating …
government organizations. Several studies have been recently published investigating …
[PDF][PDF] Findings of the VarDial evaluation campaign 2021
This paper describes the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural …
Bhasha-Abhijnaanam: Native-script and romanized language identification for 22 Indic languages
We create publicly available language identification (LID) datasets and models in all 22
Indian languages listed in the Indian constitution in both native-script and romanized text …
Indian languages listed in the Indian constitution in both native-script and romanized text …
Can Multilingual Transformers Fight the COVID-19 Infodemic?
The massive spread of false information on social media has become a global risk
especially in a global pandemic situation like COVID-19. False information detection has …
especially in a global pandemic situation like COVID-19. False information detection has …
DAAI at CASE 2021 task 1: Transformer-based multilingual socio-political and crisis event detection
Automatic socio-political and crisis event detection has been a challenge for natural
language processing as well as social and political science communities, due to the …
language processing as well as social and political science communities, due to the …
Language Identification of Hindi-English tweets using code-mixed BERT
Language identification of social media text has been an interesting problem of study in
recent years. Social media messages are predominantly in code mixed in non-English …
recent years. Social media messages are predominantly in code mixed in non-English …
Geographically-informed language identification
J Dunn, L Edwards-Brown - arxiv preprint arxiv:2403.09892, 2024 - arxiv.org
This paper develops an approach to language identification in which the set of languages
considered by the model depends on the geographic origin of the text in question. Given that …
considered by the model depends on the geographic origin of the text in question. Given that …
From N-grams to Pre-trained Multilingual Models For Language Identification
T Sindane, V Marivate - arxiv preprint arxiv:2410.08728, 2024 - arxiv.org
In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual
models for Language Identification (LID) across 11 South African languages. For N-gram …
models for Language Identification (LID) across 11 South African languages. For N-gram …
[HTML][HTML] Classification of Indian media titles using deep learning techniques
Automatic speech recognition is being used everywhere these days. An essential part of this
is language identification. Our goal here is to identify the language of the media title, such as …
is language identification. Our goal here is to identify the language of the media title, such as …
WLV-RIT at SemEval-2021 task 5: A neural transformer framework for detecting toxic spans
In recent years, the widespread use of social media has led to an increase in the generation
of toxic and offensive content on online platforms. In response, social media platforms have …
of toxic and offensive content on online platforms. In response, social media platforms have …