Generative verifiers: Reward modeling as next-token prediction

L Zhang, A Hosseini, H Bansal, M Kazemi… - arxiv preprint arxiv …, 2024 - arxiv.org
Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

O Shorinwa, Z Mei, J Lidard, AZ Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable performance of large language models (LLMs) in content generation,
coding, and common-sense reasoning has spurred widespread integration into many facets …

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

H Shi, Y Wang, L Han, H Zhang, H Wang - arxiv preprint arxiv:2412.05723, 2024 - arxiv.org
Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a
critical challenge. While recent Bayesian methods have demonstrated effectiveness in …

Efficient and effective uncertainty quantification for LLMs

M **ong, A Santilli, M Kirchhof, A Golinski… - … Safe Generative AI …, 2024 - openreview.net
Uncertainty quantification (UQ) is crucial for ensuring the safe deployment of large language
model, particularly in high-stakes applications where hallucinations can be harmful …

Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning

JW Zimmerman, D Hudon, K Cramer, AJ Ruiz… - arxiv preprint arxiv …, 2024 - arxiv.org
Tokenization is a necessary component within the current architecture of many language
models, including the transformer-based large language models (LLMs) of Generative AI …

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

X Guan, J Zeng, F Meng, C **n, Y Lu, H Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
Large Language Models (LLMs) have shown remarkable potential in reasoning while they
still suffer from severe factual hallucinations due to timeliness, accuracy, and coverage of …

PredictaBoard: Benchmarking LLM Score Predictability

L Pacchiardi, K Voudouris, B Slater… - arxiv preprint arxiv …, 2025 - arxiv.org
Despite possessing impressive skills, Large Language Models (LLMs) often fail
unpredictably, demonstrating inconsistent success in even basic common sense reasoning …

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

T Hannan, MM Islam, J Gu, T Seidl… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) excel at retrieving information from lengthy text, but their
vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for …

On a spurious interaction between uncertainty scores and answer evaluation metrics in generative qa tasks

A Santilli, M **ong, M Kirchhof, P Rodriguez… - … Safe Generative AI …, 2024 - openreview.net
Knowing when a language model is uncertain about its generations is a key challenge for
enhancing LLMs' safety and reliability. An increasing issue in the field of Uncertainty …