Estimating LLM Uncertainty with Logits
H Ma, J Chen, G Wang, C Zhang - arxiv preprint arxiv:2502.00290, 2025 - arxiv.org
In recent years, Large Language Models (LLMs) have seen remarkable advancements and
have been extensively integrated across various fields. Despite their progress, LLMs are …
have been extensively integrated across various fields. Despite their progress, LLMs are …
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
One of the most widely used methods to evaluate LLMs are Multiple Choice Question (MCQ)
tests. MCQ benchmarks enable the testing of LLM knowledge on almost any topic at scale …
tests. MCQ benchmarks enable the testing of LLM knowledge on almost any topic at scale …