A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

A brief overview of ChatGPT: The history, status quo and potential future development

T Wu, S He, J Liu, S Sun, K Liu… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org
ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI,
has attracted world-wide attention for its capability of dealing with challenging language …

Direct preference optimization: Your language model is secretly a reward model

R Rafailov, A Sharma, E Mitchell… - Advances in …, 2023 - proceedings.neurips.cc
While large-scale unsupervised language models (LMs) learn broad world knowledge and
some reasoning skills, achieving precise control of their behavior is difficult due to the …

Judging llm-as-a-judge with mt-bench and chatbot arena

L Zheng, WL Chiang, Y Sheng… - Advances in …, 2023 - proceedings.neurips.cc
Evaluating large language model (LLM) based chat assistants is challenging due to their
broad capabilities and the inadequacy of existing benchmarks in measuring human …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Generative AI at work

E Brynjolfsson, D Li, L Raymond - The Quarterly Journal of …, 2025 - academic.oup.com
We study the staggered introduction of a generative AI–based conversational assistant
using data from 5,172 customer-support agents. Access to AI assistance increases worker …

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

Graph of thoughts: Solving elaborate problems with large language models

M Besta, N Blach, A Kubicek, R Gerstenberger… - Proceedings of the …, 2024 - ojs.aaai.org
Abstract We introduce Graph of Thoughts (GoT): a framework that advances prompting
capabilities in large language models (LLMs) beyond those offered by paradigms such as …

The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …