Neural machine translation for low-resource languages: A survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …
the early 2000s and has already entered a mature phase. While considered the most widely …
Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
No language left behind: Scaling human-centered machine translation
Driven by the goal of eradicating language barriers on a global scale, machine translation
has solidified itself as a key focus of artificial intelligence research today. However, such …
has solidified itself as a key focus of artificial intelligence research today. However, such …
Beyond english-centric multilingual machine translation
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …
translation by training a single model able to translate between any pair of languages …
Findings of the 2019 conference on machine translation (WMT19)
L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Survey of low-resource machine translation
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …
research. There are currently around 7,000 languages spoken in the world and almost all …
Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia
We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …
ParaCrawl: Web-scale acquisition of parallel corpora
M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk
We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …
the web, using open source software. We empirically compare alternative methods and …
Detecting hallucinated content in conditional neural sequence generation
Neural sequence models can generate highly fluent sentences, but recent studies have also
shown that they are also prone to hallucinate additional content not supported by the input …
shown that they are also prone to hallucinate additional content not supported by the input …
CCMatrix: Mining billions of high-quality parallel sentences on the web
We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …