A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024‏ - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

[PDF][PDF] Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023‏ - researchgate.net
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023‏ - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

[PDF][PDF] Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents

W Chen, Y Su, J Zuo, C Yang… - arxiv preprint …, 2023‏ - … .itic-sci.com
Autonomous agents empowered by Large Language Models (LLMs) have undergone
significant improvements, enabling them to generalize across a broad spectrum of tasks …

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arxiv preprint arxiv …, 2023‏ - arxiv.org
As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024‏ - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Generative judge for evaluating alignment

J Li, S Sun, W Yuan, RZ Fan, H Zhao, P Liu - arxiv preprint arxiv …, 2023‏ - arxiv.org
The rapid development of Large Language Models (LLMs) has substantially expanded the
range of tasks they can address. In the field of Natural Language Processing (NLP) …

Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arxiv preprint arxiv:2402.01383, 2024‏ - arxiv.org
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

Universal self-consistency for large language model generation

X Chen, R Aksitov, U Alon, J Ren, K **ao, P Yin… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable
performance gains on various challenging tasks, by utilizing multiple reasoning paths …

Branch-solve-merge improves large language model evaluation and generation

S Saha, O Levy, A Celikyilmaz, M Bansal… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) are frequently used for multi-faceted language generation
and evaluation tasks that involve satisfying intricate user constraints or taking into account …