Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

F Xu, Q Hao, Z Zong, J Wang, Y Zhang, J Wang… - arxiv preprint arxiv …, 2025 - arxiv.org
Language has long been conceived as an essential tool for human reasoning. The
breakthrough of Large Language Models (LLMs) has sparked significant research interest in …

Imitate, explore, and self-improve: A reproduction report on slow-thinking reasoning systems

Y Min, Z Chen, J Jiang, J Chen, J Deng, Y Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable
capabilities in solving complex reasoning tasks. These systems typically engage in an …

Critic-v: Vlm critics help catch vlm errors in multimodal reasoning

D Zhang, J Lei, J Li, X Wang, Y Liu, Z Yang, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-language models~(VLMs) have shown remarkable advancements in multimodal
reasoning tasks. However, they still often generate inaccurate or irrelevant responses due to …

o1-coder: an o1 replication for coding

Y Zhang, S Wu, Y Yang, J Shu, J **ao, C Kong… - arxiv preprint arxiv …, 2024 - arxiv.org
The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with
a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree …

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Z **, D Yang, J Huang, J Tang, G Li, Y Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
Training large language models (LLMs) to spend more time thinking and reflection before
responding is crucial for effectively solving complex reasoning tasks in fields such as …

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

J Jiang, J Chen, J Li, R Ren, S Wang, WX Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing large language models (LLMs) show exceptional problem-solving capabilities but
might struggle with complex reasoning tasks. Despite the successes of chain-of-thought and …

Beyond examples: High-level automated reasoning paradigm in in-context learning via mcts

J Wu, M Feng, S Zhang, F Che, Z Wen, J Tao - arxiv preprint arxiv …, 2024 - arxiv.org
In-context Learning (ICL) enables large language models (LLMs) to tackle downstream
tasks through sophisticated prompting and high-quality demonstrations. However, this …

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

Z Wang, J Wu, Y Lai, C Zhang, D Zhou - arxiv preprint arxiv:2406.18200, 2024 - arxiv.org
Large Language Models (LLMs) demonstrate remarkable emergent abilities across various
tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based …

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

M Song, Z Su, X Qu, J Zhou, Y Cheng - arxiv preprint arxiv:2501.03124, 2025 - arxiv.org
Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-
making tasks, where each intermediate step plays an important role in the reasoning …