A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024‏ - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Simpo: Simple preference optimization with a reference-free reward

Y Meng, M **a, D Chen - Advances in Neural Information …, 2025‏ - proceedings.neurips.cc
Abstract Direct Preference Optimization (DPO) is a widely used offline preference
optimization algorithm that reparameterizes reward functions in reinforcement learning from …

Chatbot arena: An open platform for evaluating llms by human preference

WL Chiang, L Zheng, Y Sheng… - … on Machine Learning, 2024‏ - openreview.net
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …

Self-play fine-tuning converts weak language models to strong language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arxiv preprint arxiv:2401.01335, 2024‏ - arxiv.org
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

Benchmarking large language models in retrieval-augmented generation

J Chen, H Lin, X Han, L Sun - Proceedings of the AAAI Conference on …, 2024‏ - ojs.aaai.org
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …

Toolllm: Facilitating large language models to master 16000+ real-world apis

Y Qin, S Liang, Y Ye, K Zhu, L Yan, Y Lu, Y Lin… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Despite the advancements of open-source large language models (LLMs), eg, LLaMA, they
remain significantly limited in tool-use capabilities, ie, using external tools (APIs) to fulfill …

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023‏ - proceedings.neurips.cc
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …