Narrowing the knowledge evaluation gap: Open-domain question answering with multi-granularity answers

G Yona, R Aharoni, M Geva - arxiv preprint arxiv:2401.04695, 2024 - arxiv.org
Factual questions typically can be answered correctly at different levels of granularity. For
example, both``August 4, 1961''and``1961''are correct answers to the question``When was …

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Y Gui, Y **, Z Ren - arxiv preprint arxiv:2405.10301, 2024 - arxiv.org
Before deploying outputs from foundation models in high-stakes tasks, it is imperative to
ensure that they align with human values. For instance, in radiology report generation …

LUQ: Long-text Uncertainty Quantification for LLMs

C Zhang, F Liu, M Basaldella, N Collier - arxiv preprint arxiv:2403.20279, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capability in a variety of
NLP tasks. Despite their effectiveness, these models are prone to generate nonfactual …

[PDF][PDF] Large Language Models as an active Bayesian filter: information acquisition and integration

S Patania, E Masiero, L Brini, V Piskovskyi… - Proceedings of the …, 2024 - researchgate.net
Abstract This study investigates Large Language Models (LLMs) as dynamic Bayesian filters
through question-asking experiments inspired by cognitive science. We analyse LLMs' …