A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - techrxiv.org
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

[PDF][PDF] Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents

W Chen, Y Su, J Zuo, C Yang… - arxiv preprint …, 2023 - … .itic-sci.com
Autonomous agents empowered by Large Language Models (LLMs) have undergone
significant improvements, enabling them to generalize across a broad spectrum of tasks …

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arxiv preprint arxiv …, 2023 - arxiv.org
As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

Generative judge for evaluating alignment

J Li, S Sun, W Yuan, RZ Fan, H Zhao, P Liu - arxiv preprint arxiv …, 2023 - arxiv.org
The rapid development of Large Language Models (LLMs) has substantially expanded the
range of tasks they can address. In the field of Natural Language Processing (NLP) …

Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

W Chen, Y Su, J Zuo, C Yang, C Yuan… - The Twelfth …, 2023 - openreview.net
Autonomous agents empowered by Large Language Models (LLMs) have undergone
significant improvements, enabling them to generalize across a broad spectrum of tasks …

Leave no document behind: Benchmarking long-context llms with extended multi-doc qa

M Wang, L Chen, F Cheng, S Liao… - Proceedings of the …, 2024 - aclanthology.org
Long-context modeling capabilities of Large Language Models (LLMs) have garnered
widespread attention, leading to the emergence of LLMs with ultra-context windows …

Branch-solve-merge improves large language model evaluation and generation

S Saha, O Levy, A Celikyilmaz, M Bansal… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are frequently used for multi-faceted language generation
and evaluation tasks that involve satisfying intricate user constraints or taking into account …

Optimization-based prompt injection attack to llm-as-a-judge

J Shi, Z Yuan, Y Liu, Y Huang, P Zhou, L Sun… - Proceedings of the …, 2024 - dl.acm.org
LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set
of candidates for a given question. LLM-as-a-Judge has many applications such as LLM …