NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Language varieties of Italy: Technology challenges and opportunities

A Ramponi - Transactions of the Association for Computational …, 2024 - direct.mit.edu
Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …

Murreviikko-a dialectologically annotated and normalized dataset of Finnish tweets

O Kuparinen - Tenth Workshop on NLP for Similar Languages …, 2023 - aclanthology.org
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been
dialectologically annotated and manually normalized to a standard form. The dataset can be …

SMTCE: A social media text classification evaluation benchmark and BERTology models for Vietnamese

LT Nguyen, K Van Nguyen, NLT Nguyen - arxiv preprint arxiv:2209.10482, 2022 - arxiv.org
Text classification is a typical natural language processing or computational linguistics task
with various interesting applications. As the number of users on social media platforms …

Addressing religious hate online: from taxonomy creation to automated detection

A Ramponi, B Testa, S Tonelli, E Jezek - PeerJ Computer Science, 2022 - peerj.com
Abusive language in online social media is a pervasive and harmful phenomenon which
calls for automatic computational approaches to be successfully contained. Previous studies …