A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

[HTML][HTML] Empowering biomedical discovery with AI agents

S Gao, A Fang, Y Huang, V Giunchiglia, A Noori… - Cell, 2024 - cell.com
We envision" AI scientists" as systems capable of skeptical learning and reasoning that
empower biomedical research through collaborative agents that integrate AI models and …

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms

M **ong, Z Hu, X Lu, Y Li, J Fu, J He, B Hooi - arxiv preprint arxiv …, 2023 - arxiv.org
Empowering large language models to accurately express confidence in their answers is
essential for trustworthy decision-making. Previous confidence elicitation methods, which …

Large legal fictions: Profiling legal hallucinations in large language models

M Dahl, V Magesh, M Suzgun… - Journal of Legal Analysis, 2024 - academic.oup.com
Do large language models (LLMs) know the law? LLMs are increasingly being used to
augment legal practice, education, and research, yet their revolutionary potential is …

" I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust

SSY Kim, QV Liao, M Vorvoreanu, S Ballard… - Proceedings of the …, 2024 - dl.acm.org
Widely deployed large language models (LLMs) can produce convincing yet incorrect
outputs, potentially misleading users who may rely on them as if they were correct. To …

Evaluation and analysis of hallucination in large vision-language models

J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Vision-Language Models (LVLMs) have recently achieved remarkable success.
However, LVLMs are still plagued by the hallucination problem, which limits the practicality …

Does fine-tuning LLMs on new knowledge encourage hallucinations?

Z Gekhman, G Yona, R Aharoni, M Eyal… - arxiv preprint arxiv …, 2024 - arxiv.org
When large language models are aligned via supervised fine-tuning, they may encounter
new factual information that was not acquired through pre-training. It is often conjectured that …

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

S Feng, W Shi, Y Wang, W Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite efforts to expand the knowledge of large language models (LLMs), knowledge gaps-
-missing or outdated information in LLMs--might always persist given the evolving nature of …

Alignment for honesty

Y Yang, E Chern, X Qiu, G Neubig, P Liu - arxiv preprint arxiv:2312.07000, 2023 - arxiv.org
Recent research has made significant strides in applying alignment techniques to enhance
the helpfulness and harmlessness of large language models (LLMs) in accordance with …

Label-free node classification on graphs with large language models (llms)

Z Chen, H Mao, H Wen, H Han, W **, H Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
In recent years, there have been remarkable advancements in node classification achieved
by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels …