The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

N Goyal, C Gao, V Chaudhary, PJ Chen… - Transactions of the …, 2022 - direct.mit.edu
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arxiv preprint arxiv …, 2022 - arxiv.org
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

Xlm-v: Overcoming the vocabulary bottleneck in multilingual masked language models

D Liang, H Gonen, Y Mao, R Hou, N Goyal… - arxiv preprint arxiv …, 2023 - arxiv.org
Large multilingual language models typically rely on a single vocabulary shared across
100+ languages. As these models have increased in parameter count and depth …

Improving language plasticity via pretraining with active forgetting

Y Chen, K Marchisio, R Raileanu… - Advances in …, 2023 - proceedings.neurips.cc
Pretrained language models (PLMs) are today the primary model for natural language
processing. Despite their impressive downstream performance, it can be difficult to apply …

Bloom+ 1: Adding language support to bloom for zero-shot prompting

ZX Yong, H Schoelkopf, N Muennighoff, AF Aji… - arxiv preprint arxiv …, 2022 - arxiv.org
The BLOOM model is a large publicly available multilingual language model, but its
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …

NusaX: Multilingual parallel sentiment dataset for 10 Indonesian local languages

GI Winata, AF Aji, S Cahyawijaya, R Mahendra… - arxiv preprint arxiv …, 2022 - arxiv.org
Natural language processing (NLP) has a significant impact on society via technologies
such as machine translation and search engines. Despite its success, NLP technology is …

Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas

M Mager, A Oncevay, A Ebrahimi, J Ortega… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the 2021 Shared Task on Open Machine Translation for
Indigenous Languages of the Americas. The shared task featured two independent tracks …

Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

S Hu, H Zhou, M Hergul, M Gritta, G Zhang… - Transactions of the …, 2023 - direct.mit.edu
Creating high-quality annotated data for task-oriented dialog (ToD) is known to be
notoriously difficult, and the challenges are amplified when the goal is to create equitable …

Breaking physical and linguistic borders: Multilingual federated prompt tuning for low-resource languages

W Zhao, Y Chen, R Lee, X Qiu, Y Gao… - The Twelfth …, 2024 - openreview.net
Pretrained large language models (LLMs) have emerged as a cornerstone in modern
natural language processing, with their utility expanding to various applications and …

Cross-lingual few-shot learning on unseen languages

G Winata, S Wu, M Kulkarni, T Solorio… - Proceedings of the …, 2022 - aclanthology.org
Large pre-trained language models (LMs) have demonstrated the ability to obtain good
performance on downstream tasks with limited examples in cross-lingual settings. However …