Bartscore: Evaluating generated text as text generation
A wide variety of NLP applications, such as machine translation, summarization, and dialog,
involve text generation. One major challenge for these applications is how to evaluate …
involve text generation. One major challenge for these applications is how to evaluate …
Summeval: Re-evaluating summarization evaluation
The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …
Leveraging large language models for nlg evaluation: Advances and challenges
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …
introducing Large Language Models (LLMs) has opened new avenues for assessing …
A survey on multi-modal summarization
The new era of technology has brought us to the point where it is convenient for people to
share their opinions over an abundance of platforms. These platforms have a provision for …
share their opinions over an abundance of platforms. These platforms have a provision for …
MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance
A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …
systems. A desirable metric compares system output against references based on their …
Bridging the gap: A survey on integrating (human) feedback for natural language generation
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …
large language models on vast internet-scale datasets. Despite these advancements, there …
Re-evaluating evaluation in text summarization
Automated evaluation metrics as a stand-in for manual evaluation are an essential part of
the development of text-generation tasks such as text summarization. However, while the …
the development of text-generation tasks such as text summarization. However, while the …
Leveraging large language models for nlg evaluation: A survey
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …
introducing Large Language Models (LLMs) has opened new avenues for assessing …
SUPERT: Towards new frontiers in unsupervised evaluation metrics for multi-document summarization
We study unsupervised multi-document summarization evaluation metrics, which require
neither human-written reference summaries nor human annotations (eg preferences …
neither human-written reference summaries nor human annotations (eg preferences …
What have we achieved on text summarization?
Deep learning has led to significant improvement in text summarization with various
methods investigated and improved ROUGE scores reported over the years. However, gaps …
methods investigated and improved ROUGE scores reported over the years. However, gaps …