Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

C Lyu, S Gao, Y Gu, W Zhang, J Gao, K Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Reasoning abilities, especially those for solving complex math problems, are crucial
components of general intelligence. Recent advances by proprietary companies, such as o …

Scaling Test-Time Compute Without Verification or RL is Suboptimal

A Setlur, N Rajaraman, S Levine, A Kumar - arxiv preprint arxiv …, 2025 - arxiv.org
Despite substantial advances in scaling test-time compute, an ongoing debate in the
community is how it should be scaled up to enable continued and efficient improvements …

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Y Yin, G Carenini - arxiv preprint arxiv:2502.04689, 2025 - arxiv.org
Large language models (LLMs) achieve remarkable performance on challenging
benchmarks that are often structured as multiple-choice question-answering (QA) tasks …