A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2024 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Aligning large language models with human: A survey

Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) trained on extensive textual corpora have emerged as
leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Benchmarking foundation models with language-model-as-an-examiner

Y Bai, J Ying, Y Cao, X Lv, Y He… - Advances in …, 2023 - proceedings.neurips.cc
Numerous benchmarks have been established to assess the performance of foundation
models on open-ended question answering, which serves as a comprehensive test of a …

Longbench: A bilingual, multitask benchmark for long context understanding

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …

When large language models meet personalization: Perspectives of challenges and opportunities

J Chen, Z Liu, X Huang, C Wu, Q Liu, G Jiang, Y Pu… - World Wide Web, 2024 - Springer
The advent of large language models marks a revolutionary breakthrough in artificial
intelligence. With the unprecedented scale of training and model parameters, the capability …

Cognitive mirage: A review of hallucinations in large language models

H Ye, T Liu, A Zhang, W Hua, W Jia - arxiv preprint arxiv:2309.06794, 2023 - arxiv.org
As large language models continue to develop in the field of AI, text generation systems are
susceptible to a worrisome phenomenon known as hallucination. In this study, we …