Aya dataset: An open-access collection for multilingual instruction tuning

S Singh, F Vargus, D Dsouza, BF Karlsson… - arxiv preprint arxiv …, 2024 - arxiv.org
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …

Predicting the type and target of offensive posts in social media

M Zampieri, S Malmasi, P Nakov, S Rosenthal… - arxiv preprint arxiv …, 2019 - arxiv.org
As offensive content has become pervasive in social media, there has been much research
in identifying potentially offensive messages. However, previous work on this topic did not …

CAMeL tools: An open source python toolkit for Arabic natural language processing

O Obeid, N Zalmout, S Khalifa, D Taji… - Proceedings of the …, 2020 - aclanthology.org
Abstract We present CAMeL Tools, a collection of open-source tools for Arabic natural
language processing in Python. CAMeL Tools currently provides utilities for pre-processing …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

NADI 2022: The third nuanced Arabic dialect identification shared task

M Abdul-Mageed, C Zhang, AR Elmadany… - arxiv preprint arxiv …, 2022 - arxiv.org
We describe findings of the third Nuanced Arabic Dialect Identification Shared Task (NADI
2022). NADI aims at advancing state of the art Arabic NLP, including on Arabic dialects. It …

Findings of the VarDial evaluation campaign 2023

N Aepli, Ç Çöltekin, R Van Der Goot… - arxiv preprint arxiv …, 2023 - arxiv.org
This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …

A panoramic survey of natural language processing in the Arab world

K Darwish, N Habash, M Abbas, H Al-Khalifa… - Communications of the …, 2021 - dl.acm.org
THE TERM NATURAL language refers to any system of symbolic communication (spoken,
signed, or written) that has evolved naturally in humans without intentional human planning …

The MADAR shared task on Arabic fine-grained dialect identification

H Bouamor, S Hassan, N Habash - Proceedings of the Fourth …, 2019 - aclanthology.org
In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-
Grained Dialect Identification. This shared task was organized as part of The Fourth Arabic …

Fine-grained Arabic dialect identification

M Salameh, H Bouamor, N Habash - Proceedings of the 27th …, 2018 - aclanthology.org
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-
grained five dialect classes plus Standard Arabic (6-way classification). This paper presents …

Speech recognition challenge in the wild: Arabic MGB-3

A Ali, S Vogel, S Renals - 2017 IEEE Automatic Speech …, 2017 - ieeexplore.ieee.org
This paper describes the Arabic MGB-3 Challenge-Arabic Speech Recognition in the Wild.
Unlike last year's Arabic MGB-2 Challenge, for which the recognition task was based on …