- Academic Search

H Ma, J Chen, G Wang, C Zhang - arxiv preprint arxiv:2502.00290, 2025 - arxiv.org

In recent years, Large Language Models (LLMs) have seen remarkable advancements and
have been extensively integrated across various fields. Despite their progress, LLMs are …

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

T Fu, J Conde, G Martínez, M Grandury… - arxiv preprint arxiv …, 2025 - arxiv.org

One of the most widely used methods to evaluate LLMs are Multiple Choice Question (MCQ)
tests. MCQ benchmarks enable the testing of LLM knowledge on almost any topic at scale …

Create alert

Cite

Advanced search

Saved to My library

Efficient and effective uncertainty quantification for LLMs

Estimating LLM Uncertainty with Logits

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong