Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Summeval: Re-evaluating summarization evaluation

AR Fabbri, W Kryściński, B McCann, C **ong… - Transactions of the …, 2021 - direct.mit.edu
The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …

Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics

A Pagnoni, V Balachandran, Y Tsvetkov - arxiv preprint arxiv:2104.13346, 2021 - arxiv.org
Modern summarization models generate highly fluent but often factually unreliable outputs.
This motivated a surge of metrics attempting to measure the factuality of automatically …

Text preprocessing for text mining in organizational research: Review and recommendations

L Hickman, S Thapa, L Tay, M Cao… - Organizational …, 2022 - journals.sagepub.com
Recent advances in text mining have provided new methods for capitalizing on the
voluminous natural language text data created by organizations, their employees, and their …

Neural text summarization: A critical evaluation

W Kryściński, NS Keskar, B McCann, C **ong… - arxiv preprint arxiv …, 2019 - arxiv.org
Text summarization aims at compressing long documents into a shorter form that conveys
the most important parts of the original document. Despite increased interest in the …

On extractive and abstractive neural document summarization with transformer language models

J Pilault, R Li, S Subramanian… - Proceedings of the 2020 …, 2020 - aclanthology.org
We present a method to produce abstractive summaries of long documents that exceed
several thousand words via neural abstractive summarization. We perform a simple …

Recent automatic text summarization techniques: a survey

M Gambhir, V Gupta - Artificial Intelligence Review, 2017 - Springer
As information is available in abundance for every topic on internet, condensing the
important information in the form of summary would benefit a number of users. Hence, there …

Re-evaluating evaluation in text summarization

M Bhandari, P Gour, A Ashfaq, P Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
Automated evaluation metrics as a stand-in for manual evaluation are an essential part of
the development of text-generation tasks such as text summarization. However, while the …

A structured review of the validity of BLEU

E Reiter - Computational Linguistics, 2018 - direct.mit.edu
The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems,
especially in machine translation and natural language generation. I present a structured …