Natural language processing in ethiopian languages: Current state, challenges, and opportunities

AL Tonja, TD Belay, IA Azime, AA Ayele… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey delves into the current state of natural language processing (NLP) for four
Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we …

Parallel corpus for indigenous language translation: Spanish-mazatec and spanish-mixtec

AL Tonja, C Maldonado-Sifuentes, DAM Castillo… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for
machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican …

Transformer-Based Amharic-to-English Machine Translation with Character Embedding and Combined Regularization Techniques

SH Asefa, Y Assabie - IEEE Access, 2024 - ieeexplore.ieee.org
Amharic is the working language of Ethiopia and, owing to its Semitic characteristics, the
language is known for its complex morphology. It is also an under-resourced language …

EthioMT: Parallel Corpus for Low-resource Ethiopian Languages

AL Tonja, O Kolesnikova, A Gelbukh… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent research in natural language processing (NLP) has achieved impressive
performance in tasks such as machine translation (MT), news classification, and question …

Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

Y Kostiuk, AL Tonja, G Sidorov… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we investigate the issue of hate speech by presenting a novel task of
translating hate speech into non-hate speech text while preserving its meaning. As a case …

Reframing social media discourse: Converting hate speech to non-hate speech

Y Kostiuk, AL Tonja, G Sidorov… - Journal of Intelligent & … - content.iospress.com
In this paper, we investigate the issue of hate speech by presenting a novel task of
translating hate speech into non-hate speech text while preserving its meaning. As a case …

AmhEn: Amharic-English Large Parallel Corpus for Machine Translation

AL Tonja, TD Belay, O Kolesnikova, SM Yimam… - 2023 - researchsquare.com
Recently, using deep neural networks for machine translation (MT) tasks has received great
attention. In order for these networks to learn abstract representations of the input and store …