The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
Xlm-v: Overcoming the vocabulary bottleneck in multilingual masked language models
Large multilingual language models typically rely on a single vocabulary shared across
100+ languages. As these models have increased in parameter count and depth …
100+ languages. As these models have increased in parameter count and depth …
Improving language plasticity via pretraining with active forgetting
Pretrained language models (PLMs) are today the primary model for natural language
processing. Despite their impressive downstream performance, it can be difficult to apply …
processing. Despite their impressive downstream performance, it can be difficult to apply …
Bloom+ 1: Adding language support to bloom for zero-shot prompting
The BLOOM model is a large publicly available multilingual language model, but its
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
NusaX: Multilingual parallel sentiment dataset for 10 Indonesian local languages
Natural language processing (NLP) has a significant impact on society via technologies
such as machine translation and search engines. Despite its success, NLP technology is …
such as machine translation and search engines. Despite its success, NLP technology is …
Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas
This paper presents the results of the 2021 Shared Task on Open Machine Translation for
Indigenous Languages of the Americas. The shared task featured two independent tracks …
Indigenous Languages of the Americas. The shared task featured two independent tracks …
Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Creating high-quality annotated data for task-oriented dialog (ToD) is known to be
notoriously difficult, and the challenges are amplified when the goal is to create equitable …
notoriously difficult, and the challenges are amplified when the goal is to create equitable …
Breaking physical and linguistic borders: Multilingual federated prompt tuning for low-resource languages
Pretrained large language models (LLMs) have emerged as a cornerstone in modern
natural language processing, with their utility expanding to various applications and …
natural language processing, with their utility expanding to various applications and …
Cross-lingual few-shot learning on unseen languages
Large pre-trained language models (LMs) have demonstrated the ability to obtain good
performance on downstream tasks with limited examples in cross-lingual settings. However …
performance on downstream tasks with limited examples in cross-lingual settings. However …