Google Академик

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Сачувај Цитирај 25 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

F Xu, Q Hao, Z Zong, J Wang, Y Zhang, J Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Language has long been conceived as an essential tool for human reasoning. The
breakthrough of Large Language Models (LLMs) has sparked significant research interest in …

Сачувај Цитирај 3 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling of search and learning: A roadmap to reproduce o1 from reinforcement learning perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

Сачувај Цитирај 12 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Process reinforcement through implicit rewards

G Cui, L Yuan, Z Wang, H Wang, W Li, B He… - arxiv preprint arxiv …, 2025 - arxiv.org

Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …

Сачувај Цитирај 7 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Acemath: Advancing frontier math reasoning with post-training and reward modeling

Z Liu, Y Chen, M Shoeybi, B Catanzaro… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce AceMath, a suite of frontier math models that excel in solving
complex math problems, along with highly effective reward models capable of evaluating …

Сачувај Цитирај 4 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing llm reasoning via critique models with test-time and training-time supervision

Z **, D Yang, J Huang, J Tang, G Li, Y Ding… - arxiv preprint arxiv …, 2024 - arxiv.org

Training large language models (LLMs) to spend more time thinking and reflection before
responding is crucial for effectively solving complex reasoning tasks in fields such as …

Сачувај Цитирај 4 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

V **ang, C Snell, K Gandhi, A Albalak, A Singh… - arxiv preprint arxiv …, 2025 - arxiv.org

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends
traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required …

Сачувај Цитирај 7 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

J Jiang, J Chen, J Li, R Ren, S Wang, WX Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing large language models (LLMs) show exceptional problem-solving capabilities but
might struggle with complex reasoning tasks. Despite the successes of chain-of-thought and …

Сачувај Цитирај 3 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Progressive multimodal reasoning via active retrieval

G Dong, C Zhang, M Deng, Y Zhu, Z Dou… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-step multimodal reasoning tasks pose significant challenges for multimodal large
language models (MLLMs), and finding effective ways to enhance their performance in such …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SR: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

R Ma, P Wang, C Liu, X Liu, J Chen, B Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent studies have demonstrated the effectiveness of LLM test-time scaling. However,
existing approaches to incentivize LLMs' deep thinking abilities generally require large …

Сачувај Цитирај Сродни чланци HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Rewarding progress: Scaling automated process verifiers for llm reasoning

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Scaling of search and learning: A roadmap to reproduce o1 from reinforcement learning perspective

Process reinforcement through implicit rewards

Acemath: Advancing frontier math reasoning with post-training and reward modeling

Enhancing llm reasoning via critique models with test-time and training-time supervision

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Progressive multimodal reasoning via active retrieval

SR: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning