A survey of language model confidence estimation and calibration

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - ar** loss perspective
L Chen, A Perez-Lebel, F Suchanek… - The 2024 Conference …, 2024 - hal.science
Large Language Models (LLMs), such as GPT and LLaMA, are susceptible to generating
hallucinated answers in a confident tone. While previous efforts to elicit and calibrate …

Non-exchangeable conformal language generation with nearest neighbors

D Ulmer, C Zerva, AFT Martins - arxiv preprint arxiv:2402.00707, 2024 - arxiv.org
Quantifying uncertainty in automatically generated text is important for letting humans check
potential hallucinations and making systems more reliable. Conformal prediction is an …

Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown

G Liu, X Wang, L Yuan, Y Chen, H Peng - arxiv preprint arxiv:2311.09731, 2023 - arxiv.org
Large Language Models (LLMs) often struggle when faced with situations where they lack
the prerequisite knowledge to generate a sensical response. In these cases, models tend to …

Calibrating Large Language Models Using Their Generations Only

D Ulmer, M Gubri, H Lee, S Yun, SJ Oh - arxiv preprint arxiv:2403.05973, 2024 - arxiv.org
As large language models (LLMs) are increasingly deployed in user-facing applications,
building trust and maintaining safety by accurately quantifying a model's confidence in its …

Introspective planning: Guiding language-enabled agents to refine their own uncertainty

K Liang, Z Zhang, J Fernández Fisac - arxiv e-prints, 2024 - ui.adsabs.harvard.edu
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to
comprehend natural language instructions and strategically plan high-level actions through …