Google 학술 검색

F Xu, Q Hao, Z Zong, J Wang, Y Zhang, J Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Language has long been conceived as an essential tool for human reasoning. The
breakthrough of Large Language Models (LLMs) has sparked significant research interest in …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Imitate, explore, and self-improve: A reproduction report on slow-thinking reasoning systems

Y Min, Z Chen, J Jiang, J Chen, J Deng, Y Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable
capabilities in solving complex reasoning tasks. These systems typically engage in an …

저장 인용 7회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Critic-v: Vlm critics help catch vlm errors in multimodal reasoning

D Zhang, J Lei, J Li, X Wang, Y Liu, Z Yang, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-language models~(VLMs) have shown remarkable advancements in multimodal
reasoning tasks. However, they still often generate inaccurate or irrelevant responses due to …

저장 인용 3회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

o1-coder: an o1 replication for coding

Y Zhang, S Wu, Y Yang, J Shu, J **ao, C Kong… - arxiv preprint arxiv …, 2024 - arxiv.org

The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with
a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree …

저장 인용 6회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

저장 인용 2회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Z **, D Yang, J Huang, J Tang, G Li, Y Ding… - arxiv preprint arxiv …, 2024 - arxiv.org

Training large language models (LLMs) to spend more time thinking and reflection before
responding is crucial for effectively solving complex reasoning tasks in fields such as …

저장 인용 2회 인용 관련 학술자료 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

J Jiang, J Chen, J Li, R Ren, S Wang, WX Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing large language models (LLMs) show exceptional problem-solving capabilities but
might struggle with complex reasoning tasks. Despite the successes of chain-of-thought and …

저장 인용 2회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Beyond examples: High-level automated reasoning paradigm in in-context learning via mcts

J Wu, M Feng, S Zhang, F Che, Z Wen, J Tao - arxiv preprint arxiv …, 2024 - arxiv.org

In-context Learning (ICL) enables large language models (LLMs) to tackle downstream
tasks through sophisticated prompting and high-quality demonstrations. However, this …

저장 인용 2회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

Z Wang, J Wu, Y Lai, C Zhang, D Zhou - arxiv preprint arxiv:2406.18200, 2024 - arxiv.org

Large Language Models (LLMs) demonstrate remarkable emergent abilities across various
tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based …

저장 인용 2회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

M Song, Z Su, X Qu, J Zhou, Y Cheng - arxiv preprint arxiv:2501.03124, 2025 - arxiv.org

Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-
making tasks, where each intermediate step plays an important role in the reasoning …

저장 인용 1회 인용 관련 학술자료 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Imitate, explore, and self-improve: A reproduction report on slow-thinking reasoning systems

Critic-v: Vlm critics help catch vlm errors in multimodal reasoning

o1-coder: an o1 replication for coding

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Beyond examples: High-level automated reasoning paradigm in in-context learning via mcts

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models