A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Security and privacy challenges of large language models: A survey

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2025 - dl.acm.org
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Wizardlm: Empowering large language models to follow complex instructions

C Xu, Q Sun, K Zheng, X Geng, P Zhao, J Feng… - arxiv preprint arxiv …, 2023 - arxiv.org
Training large language models (LLMs) with open-domain instruction following data brings
colossal success. However, manually creating such instruction data is very time-consuming …

Detecting hallucinations in large language models using semantic entropy

S Farquhar, J Kossen, L Kuhn, Y Gal - Nature, 2024 - nature.com
Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive
reasoning and question-answering capabilities but often 'hallucinate'false outputs and …

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Chain-of-verification reduces hallucination in large language models

S Dhuliawala, M Komeili, J Xu, R Raileanu, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Generation of plausible yet incorrect factual information, termed hallucination, is an
unsolved issue in large language models. We study the ability of language models to …