How to evaluate machine translation: A review of automated and human metrics

E Chatzikoumi - Natural Language Engineering, 2020 - cambridge.org
This article presents the most up-to-date, influential automated, semiautomated and human
metrics used to evaluate the quality of machine translation (MT) output and provides the …

A comprehensive survey on various fully automatic machine translation evaluation metrics

S Chauhan, P Daniel - Neural Processing Letters, 2023 - Springer
The fast advancement in machine translation models necessitates the development of
accurate evaluation metrics that would allow researchers to track the progress in text …

Why we need new evaluation metrics for NLG

J Novikova, O Dušek, AC Curry, V Rieser - arxiv preprint arxiv …, 2017 - arxiv.org
The majority of NLG evaluation relies on automatic metrics, such as BLEU. In this paper, we
motivate the need for novel, system-and data-independent automatic evaluation methods …

Automatic machine translation evaluation in many languages via zero-shot paraphrasing

B Thompson, M Post - arxiv preprint arxiv:2004.14564, 2020 - arxiv.org
We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …

Results of the WMT19 metrics shared task: Segment-level and strong MT systems pose big challenges

Q Ma, JTZ Wei, O Bojar, Y Graham - 2019 - doras.dcu.ie
This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked
to score the outputs of the translations systems competing in the WMT19 News Translation …

Translation quality assessment: A brief survey on manual and automatic methods

L Han, GJF Jones, AF Smeaton - arxiv preprint arxiv:2105.03311, 2021 - arxiv.org
To facilitate effective translation modeling and translation studies, one of the crucial
questions to address is how to assess translation quality. From the perspectives of accuracy …

A global analysis of metrics used for measuring performance in natural language processing

K Blagec, G Dorffner, M Moradi, S Ott… - arxiv preprint arxiv …, 2022 - arxiv.org
Measuring the performance of natural language processing models is challenging.
Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine …

Adequacy–fluency metrics: Evaluating mt in the continuous space model framework

RE Banchs, LF D'Haro, H Li - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
This work extends and evaluates a two-dimensional automatic evaluation metric for machine
translation, which is designed to operate at the sentence level. The metric is based on the …

A critical analysis of metrics used for measuring progress in artificial intelligence

K Blagec, G Dorffner, M Moradi, M Samwald - arxiv preprint arxiv …, 2020 - arxiv.org
Comparing model performances on benchmark datasets is an integral part of measuring
and driving progress in artificial intelligence. A model's performance on a benchmark …

SBSim: A sentence-BERT similarity-based evaluation metric for indian language neural machine translation systems

K Mrinalini, P Vijayalakshmi… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Machine translation (MT) outputs are widely scored using automatic evaluation metrics and
human evaluation scores. The automatic evaluation metrics are expected to be easily …