NusaCrowd: Open source initiative for Indonesian NLP resources
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …
Indonesian languages, including opening access to previously non-public resources …
Language varieties of Italy: Technology challenges and opportunities
A Ramponi - Transactions of the Association for Computational …, 2024 - direct.mit.edu
Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
Murreviikko-a dialectologically annotated and normalized dataset of Finnish tweets
O Kuparinen - Tenth Workshop on NLP for Similar Languages …, 2023 - aclanthology.org
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been
dialectologically annotated and manually normalized to a standard form. The dataset can be …
dialectologically annotated and manually normalized to a standard form. The dataset can be …
SMTCE: A social media text classification evaluation benchmark and BERTology models for Vietnamese
Text classification is a typical natural language processing or computational linguistics task
with various interesting applications. As the number of users on social media platforms …
with various interesting applications. As the number of users on social media platforms …
Addressing religious hate online: from taxonomy creation to automated detection
Abusive language in online social media is a pervasive and harmful phenomenon which
calls for automatic computational approaches to be successfully contained. Previous studies …
calls for automatic computational approaches to be successfully contained. Previous studies …