Neural machine translation for low-resource languages: A survey

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

Beyond english-centric multilingual machine translation

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021 - jmlr.org
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …

Language-agnostic BERT sentence embedding

F Feng, Y Yang, D Cer, N Arivazhagan… - arxiv preprint arxiv …, 2020 - arxiv.org
While BERT is an effective method for learning monolingual sentence embeddings for
semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019) …

Making monolingual sentence embeddings multilingual using knowledge distillation

N Reimers, I Gurevych - arxiv preprint arxiv:2004.09813, 2020 - arxiv.org
We present an easy and efficient method to extend existing sentence embedding models to
new languages. This allows to create multilingual versions from previously monolingual …

Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond

M Artetxe, H Schwenk - … of the association for computational linguistics, 2019 - direct.mit.edu
We introduce an architecture to learn joint multilingual sentence representations for 93
languages, belonging to more than 30 different families and written in 28 different scripts …

MLQA: Evaluating cross-lingual extractive question answering

P Lewis, B Oğuz, R Rinott, S Riedel… - arxiv preprint arxiv …, 2019 - arxiv.org
Question answering (QA) models have shown rapid progress enabled by the availability of
large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to …

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Scaling neural machine translation to 200 languages

NLLB Team - Nature, 2024 - pmc.ncbi.nlm.nih.gov
The development of neural techniques has opened up new avenues for research in
machine translation. Today, neural machine translation (NMT) systems can leverage highly …

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia

H Schwenk, V Chaudhary, S Sun, H Gong… - arxiv preprint arxiv …, 2019 - arxiv.org
We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …