Neural machine translation for low-resource languages: A survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …
the early 2000s and has already entered a mature phase. While considered the most widely …
Beyond english-centric multilingual machine translation
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …
translation by training a single model able to translate between any pair of languages …
[HTML][HTML] Neural machine translation: A review of methods, resources, and tools
Abstract Machine translation (MT) is an important sub-field of natural language processing
that aims to translate natural languages using computers. In recent years, end-to-end neural …
that aims to translate natural languages using computers. In recent years, end-to-end neural …
Prompting large language model for machine translation: A case study
Research on prompting has shown excellent performance with little or even no supervised
training across many tasks. However, prompting for machine translation is still under …
training across many tasks. However, prompting for machine translation is still under …
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …
The bigscience roots corpus: A 1.6 tb composite multilingual dataset
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
Hallucinations in large multilingual translation models
Hallucinated translations can severely undermine and raise safety issues when machine
translation systems are deployed in the wild. Previous research on the topic focused on …
translation systems are deployed in the wild. Previous research on the topic focused on …
Madlad-400: A multilingual and document-level large audited dataset
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …
Multilingual translation with extensible multilingual pretraining and finetuning
Recent work demonstrates the potential of multilingual pretraining of creating one model that
can be used for various tasks in different languages. Previous work in multilingual …
can be used for various tasks in different languages. Previous work in multilingual …
Deepnet: Scaling transformers to 1,000 layers
In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …