- Academic Search

[PDF] arxiv.org

Scaling of search and learning: A roadmap to reproduce o1 from reinforcement learning perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[PDF] arxiv.org

Aligning large language models via self-steering optimization

H **ang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[PDF] arxiv.org

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

A Havrilla, A Dai, L O'Mahony, K Oostermeijer… - arxiv preprint arxiv …, 2024 - arxiv.org

Synthetic data generation with Large Language Models is a promising paradigm for
augmenting natural data over a nearly infinite range of tasks. Given this variety, direct …

[PDF][PDF] Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models

G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun… - openreview.net

The release of OpenAI's o1 marks a significant milestone in AI, achieving proficiency
comparable to PhD-level expertise in mathematics and coding. While o1 excels at solving …

Gödel Agent: A Self-Referential Framework Helps for Recursively Self-Improvement

X Yin, X Wang, L Pan, X Wan, WY Wang - openreview.net

The rapid advancement of large language models (LLMs) has significantly enhanced the
capabilities of AI-driven agents across various tasks. However, existing agentic systems …

Improving the Efficiency of Test-Time Search in LLMs with Backtracking

A Singh, K Arora, S Keh, J Mercat, T Hashimoto, C Finn… - openreview.net

Solving reasoning problems is an iterative multi-step computation, where a reasoning agent
progresses through a sequence of steps, with each step logically building upon the previous …

Improving Language Model Self-Correction Capability with Meta-Feedback

X Li, Y Zhang, L Wang - openreview.net

Large language models (LLMs) are capable of self-correcting their responses by generating
feedback and refining the initial output. However, their performance may sometimes decline …