Survey of low-resource machine translation
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …
research. There are currently around 7,000 languages spoken in the world and almost all …
Having beer after prayer? measuring cultural bias in large language models
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …
diverse cultural contexts becomes crucial. Despite advancements in multilingual …
Bloom+ 1: Adding language support to bloom for zero-shot prompting
The BLOOM model is a large publicly available multilingual language model, but its
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
pretraining was limited to 46 languages. To extend the benefits of BLOOM to other …
A primer on pretrained multilingual language models
Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R,\textit {etc.} have
emerged as a viable option for bringing the power of pretraining to a large number of …
emerged as a viable option for bringing the power of pretraining to a large number of …
AmericasNLI: Evaluating zero-shot natural language understanding of pretrained multilingual models in truly low-resource languages
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot
setting, even for languages unseen during pretraining. However, prior work evaluating …
setting, even for languages unseen during pretraining. However, prior work evaluating …
Expanding pretrained models to thousands more languages via lexicon-based adaptation
The performance of multilingual pretrained models is highly dependent on the availability of
monolingual or parallel text present in a target language. Thus, the majority of the world's …
monolingual or parallel text present in a target language. Thus, the majority of the world's …
Do all languages cost the same? tokenization in the era of commercial language models
Language models have graduated from being research prototypes to commercialized
products offered as web APIs, and recent works have highlighted the multilingual …
products offered as web APIs, and recent works have highlighted the multilingual …
How to adapt your pretrained multilingual model to 1600 languages
Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer,
performing best for languages seen during pretraining. While methods exist to improve …
performing best for languages seen during pretraining. While methods exist to improve …
Some languages are more equal than others: Probing deeper into the linguistic disparity in the nlp world
Linguistic disparity in the NLP world is a problem that has been widely acknowledged
recently. However, different facets of this problem, or the reasons behind this disparity are …
recently. However, different facets of this problem, or the reasons behind this disparity are …
Mala-500: Massive language adaptation of large language models
Large language models have advanced the state of the art in natural language processing.
However, their predominant design for English or a limited set of languages creates a …
However, their predominant design for English or a limited set of languages creates a …