A systematic survey of text summarization: From statistical methods to large language models

H Zhang, PS Yu, J Zhang - arxiv preprint arxiv:2406.11289, 2024 - arxiv.org
Text summarization research has undergone several significant transformations with the
advent of deep neural networks, pre-trained language models (PLMs), and recent large …

Minicheck: Efficient fact-checking of llms on grounding documents

L Tang, P Laban, G Durrett - arxiv preprint arxiv:2404.10774, 2024 - arxiv.org
Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP:
retrieval-augmented generation, summarization, document-grounded dialogue, and more …

FineSurE: Fine-grained summarization evaluation using LLMs

H Song, H Su, I Shalyminov, J Cai… - arxiv preprint arxiv …, 2024 - arxiv.org
Automated evaluation is crucial for streamlining text summarization benchmarking and
model development, given the costly and time-consuming nature of human evaluation …

[HTML][HTML] Factual consistency evaluation of summarization in the Era of large language models

Z Luo, Q **e, S Ananiadou - Expert Systems with Applications, 2024 - Elsevier
Factual inconsistency with source documents in automatically generated summaries can
lead to misinformation or pose risks. Existing factual consistency (FC) metrics are …

Tofueval: Evaluating hallucinations of llms on topic-focused dialogue summarization

L Tang, I Shalyminov, AW Wong, J Burnsky… - arxiv preprint arxiv …, 2024 - arxiv.org
Single document news summarization has seen substantial progress on faithfulness in
recent years, driven by research on the evaluation of factual consistency, or hallucinations …

Instructing and prompting large language models for explainable cross-domain recommendations

A Petruzzelli, C Musto, L Laraspata, I Rinaldi… - Proceedings of the 18th …, 2024 - dl.acm.org
In this paper, we present a strategy to provide users with explainable cross-domain
recommendations (CDR) that exploits large language models (LLMs). Generally speaking …

Checkeval: Robust evaluation framework using large language model via checklist

Y Lee, J Kim, J Kim, H Cho, P Kang - arxiv preprint arxiv:2403.18771, 2024 - arxiv.org
We introduce CheckEval, a novel evaluation framework using Large Language Models,
addressing the challenges of ambiguity and inconsistency in current evaluation methods …

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Z Yang, S Chen, C Gao, Z Li, X Hu, K Liu… - ACM Transactions on …, 2025 - dl.acm.org
Code generation aims to automatically generate code snippets of specific programming
language according to natural language descriptions. The continuous advancements in …

Genaudit: Fixing factual errors in language model outputs with evidence

K Krishna, S Ramprasad, P Gupta, BC Wallace… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs can generate factually incorrect statements even when provided access to reference
documents. Such errors can be dangerous in high-stakes applications (eg, document …

Investigating hallucinations in pruned large language models for abstractive summarization

G Chrysostomou, Z Zhao, M Williams… - Transactions of the …, 2024 - direct.mit.edu
Despite the remarkable performance of generative large language models (LLMs) on
abstractive summarization, they face two significant challenges: their considerable size and …