Survey of hallucination in natural language generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM computing …, 2023 - dl.acm.org
Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

The curious case of hallucinations in neural machine translation

V Raunak, A Menezes, M Junczys-Dowmunt - arxiv preprint arxiv …, 2021 - arxiv.org
In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an
extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of …

Unsupervised domain clusters in pretrained language models

R Aharoni, Y Goldberg - arxiv preprint arxiv:2004.02105, 2020 - arxiv.org
The notion of" in-domain data" in NLP is often over-simplistic and vague, as textual data
varies in many nuanced linguistic aspects such as topic, style or level of formality. In …

ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk
We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …

Detecting hallucinated content in conditional neural sequence generation

C Zhou, G Neubig, J Gu, M Diab, P Guzman… - arxiv preprint arxiv …, 2020 - arxiv.org
Neural sequence models can generate highly fluent sentences, but recent studies have also
shown that they are also prone to hallucinate additional content not supported by the input …

Automatic machine translation evaluation in many languages via zero-shot paraphrasing

B Thompson, M Post - arxiv preprint arxiv:2004.14564, 2020 - arxiv.org
We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …

Domain adaptation and multi-domain adaptation for neural machine translation: A survey

D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org
The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …