Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

Unfamiliar finetuning examples control how language models hallucinate

K Kang, E Wallace, C Tomlin, A Kumar… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models are known to hallucinate when faced with unfamiliar queries, but
the underlying mechanism that govern how models hallucinate are not yet fully understood …

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arxiv

DM Park, HJ Lee - Informatization Policy, 2024 - koreascience.kr
Hallucination is a significant barrier to the utilization of large-scale language models or
multimodal models. In this study, we collected 654 computer science papers with" …

Haloscope: Harnessing unlabeled llm generations for hallucination detection

X Du, C **ao, Y Li - arxiv preprint arxiv:2409.17504, 2024 - arxiv.org
The surge in applications of large language models (LLMs) has prompted concerns about
the generation of misleading or fabricated information, known as hallucinations. Therefore …

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Z Lin, T Liang, J Xu, X Wang, R Luo, C Shi, S Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have exhibited remarkable performance on reasoning
tasks. They utilize autoregressive token generation to construct reasoning trajectories …

Improving factuality in large language models via decoding-time hallucinatory and truthful comparators

D Yang, D **ao, J Wei, M Li, Z Chen, K Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite their remarkable capabilities, Large Language Models (LLMs) are prone to
generate responses that contradict verifiable facts, ie, unfaithful hallucination content …

OLAPH: Improving Factuality in Biomedical Long-form Question Answering

M Jeong, H Hwang, C Yoon, T Lee, J Kang - arxiv preprint arxiv …, 2024 - arxiv.org
In the medical domain, numerous scenarios necessitate the long-form generation ability of
large language models (LLMs). Specifically, when addressing patients' questions, it is …

Open Problems in Machine Unlearning for AI Safety

F Barez, T Fu, A Prabhu, S Casper, A Sanyal… - arxiv preprint arxiv …, 2025 - arxiv.org
As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …

Pensieve: Retrospect-then-compare mitigates visual hallucination

D Yang, B Cao, G Chen, C Jiang - arxiv preprint arxiv:2403.14401, 2024 - arxiv.org
Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across
various vision-language tasks. However, they suffer from visual hallucination, where the …

Multi-group Uncertainty Quantification for Long-form Text Generation

T Liu, ZS Wu - arxiv preprint arxiv:2407.21057, 2024 - arxiv.org
While large language models are rapidly moving towards consumer-facing applications,
they are often still prone to factual errors and hallucinations. In order to reduce the potential …