Pre-trained language models for text generation: A survey
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
COMET-22: Unbabel-IST 2022 submission for the metrics shared task
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …
COMET: A neural framework for MT evaluation
We present COMET, a neural framework for training multilingual machine translation
evaluation models which obtains new state-of-the-art levels of correlation with human …
evaluation models which obtains new state-of-the-art levels of correlation with human …
BLEURT: Learning robust metrics for text generation
Text generation has made significant advances in the last few years. Yet, evaluation metrics
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
have lagged behind, as the most popular choices (eg, BLEU and ROUGE) may correlate …
[PDF][PDF] How multilingual is multilingual BERT
T Pires - arxiv preprint arxiv:1906.01502, 2019 - fq.pkwyx.com
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al.(2018) as
a single language model pre-trained from monolingual corpora in 104 languages, is …
a single language model pre-trained from monolingual corpora in 104 languages, is …
Bertscore: Evaluating text generation with bert
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …
common metrics, BERTScore computes a similarity score for each token in the candidate …
MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance
A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …
systems. A desirable metric compares system output against references based on their …
To ship or not to ship: An extensive evaluation of automatic metrics for machine translation
Automatic metrics are commonly used as the exclusive tool for declaring the superiority of
one machine translation system's quality over another. The community choice of automatic …
one machine translation system's quality over another. The community choice of automatic …
Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain
This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT21 News Translation …
to score the outputs of the translation systems competing in the WMT21 News Translation …