Are we learning yet? a meta review of evaluation failures across machine learning

T Liao, R Taori, ID Raji, L Schmidt - Thirty-fifth Conference on …, 2021 - openreview.net
Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …

How good are gpt models at machine translation? a comprehensive evaluation

A Hendy, M Abdelrehim, A Sharaf, V Raunak… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for
natural language generation, but their performance for machine translation has not been …

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

N Goyal, C Gao, V Chaudhary, PJ Chen… - Transactions of the …, 2022 - direct.mit.edu
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …

Experts, errors, and context: A large-scale study of human evaluation for machine translation

M Freitag, G Foster, D Grangier, V Ratnakar… - Transactions of the …, 2021 - direct.mit.edu
Human evaluation of modern high-quality machine translation systems is a difficult problem,
and there is increasing evidence that inadequate evaluation procedures can lead to …

To ship or not to ship: An extensive evaluation of automatic metrics for machine translation

T Kocmi, C Federmann, R Grundkiewicz… - arxiv preprint arxiv …, 2021 - arxiv.org
Automatic metrics are commonly used as the exclusive tool for declaring the superiority of
one machine translation system's quality over another. The community choice of automatic …

Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages

J Gala, PA Chitale, R AK, V Gumma… - arxiv preprint arxiv …, 2023 - arxiv.org
India has a rich linguistic landscape with languages from 4 major language families spoken
by over a billion people. 22 of these languages are listed in the Constitution of India …

Domain adaptation and multi-domain adaptation for neural machine translation: A survey

D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org
The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …

BLEU might be guilty but references are not innocent

M Freitag, D Grangier, I Caswell - arxiv preprint arxiv:2004.06063, 2020 - arxiv.org
The quality of automatic metrics for machine translation has been increasingly called into
question, especially for high-quality systems. This paper demonstrates that, while choice of …

Facebook ai wmt21 news translation task submission

C Tran, S Bhosale, J Cross, P Koehn, S Edunov… - arxiv preprint arxiv …, 2021 - arxiv.org
We describe Facebook's multilingual model submission to the WMT2021 shared task on
news translation. We participate in 14 language directions: English to and from Czech …

Translation artifacts in cross-lingual transfer learning

M Artetxe, G Labaka, E Agirre - arxiv preprint arxiv:2004.04721, 2020 - arxiv.org
Both human and machine translation play a central role in cross-lingual transfer learning:
many multilingual datasets have been created through professional translation services, and …