Survey of hallucination in natural language generation
Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …
the development of sequence-to-sequence deep learning technologies such as Transformer …
Evaluating large language models: A comprehensive survey
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …
spectrum of tasks. They have attracted significant attention and been deployed in numerous …
Benchmarking large language models for news summarization
Large language models (LLMs) have shown promise for automatic summarization but the
reasons behind their successes are poorly understood. By conducting a human evaluation …
reasons behind their successes are poorly understood. By conducting a human evaluation …
Holistic evaluation of language models
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …
technologies, but their capabilities, limitations, and risks are not well understood. We present …
Factscore: Fine-grained atomic evaluation of factual precision in long form text generation
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …
trivial because (1) generations often contain a mixture of supported and unsupported pieces …
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …
natural language processing (NLP), fueling a paradigm shift in information acquisition …
News summarization and evaluation in the era of gpt-3
The recent success of zero-and few-shot prompting with models like GPT-3 has led to a
paradigm shift in NLP research. In this paper, we study its impact on text summarization …
paradigm shift in NLP research. In this paper, we study its impact on text summarization …
Chatgpt as a factual inconsistency evaluator for text summarization
The performance of text summarization has been greatly boosted by pre-trained language
models. A main concern of existing methods is that most generated summaries are not …
models. A main concern of existing methods is that most generated summaries are not …
Towards a unified multi-dimensional evaluator for text generation
Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …
TRUE: Re-evaluating factual consistency evaluation
Grounded text generation systems often generate text that contains factual inconsistencies,
hindering their real-world applicability. Automatic factual consistency evaluation may help …
hindering their real-world applicability. Automatic factual consistency evaluation may help …