A survey of evaluation metrics used for NLG systems
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …
evaluating Natural Language Generation (NLG) systems. The rapid development and …
BERT: a review of applications in natural language processing and understanding
MV Koroteev - arxiv preprint arxiv:2103.11943, 2021 - arxiv.org
In this review, we describe the application of one of the most popular deep learning-based
language models-BERT. The paper describes the mechanism of operation of this model, the …
language models-BERT. The paper describes the mechanism of operation of this model, the …
BLEURT: Learning robust metrics for text generation
Text generation has made significant advances in the last few years. Yet, evaluation metrics
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
Bertscore: Evaluating text generation with bert
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …
common metrics, BERTScore computes a similarity score for each token in the candidate …
Automatic machine translation evaluation in many languages via zero-shot paraphrasing
We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …
Are references really needed? unbabel-IST 2021 submission for the metrics shared task
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics
Shared Task. With this year's focus on Multidimensional Quality Metric (MQM) as the ground …
Shared Task. With this year's focus on Multidimensional Quality Metric (MQM) as the ground …
[PDF][PDF] Results of the wmt16 metrics shared task
This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of
this task to score the outputs of the MT systems involved in the WMT16 Shared Translation …
this task to score the outputs of the MT systems involved in the WMT16 Shared Translation …
A survey on evaluation metrics for machine translation
The success of Transformer architecture has seen increased interest in machine translation
(MT). The translation quality of neural network-based MT transcends that of translations …
(MT). The translation quality of neural network-based MT transcends that of translations …
RUSE: Regressor using sentence embeddings for automatic machine translation evaluation
We introduce the RUSE metric for the WMT18 metrics shared task. Sentence embeddings
can capture global information that cannot be captured by local features based on character …
can capture global information that cannot be captured by local features based on character …
Automatic text evaluation through the lens of Wasserstein barycenters
A new metric\texttt {BaryScore} to evaluate text generation based on deep contextualized
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …