From decoding to meta-generation: Inference-time algorithms for large language models

S Welleck, A Bertsch, M Finlayson… - arxiv preprint arxiv …, 2024 - arxiv.org
One of the most striking findings in modern research on large language models (LLMs) is
that scaling up compute during training leads to better results. However, less attention has …

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

Aligning large language models via self-steering optimization

H **ang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arxiv preprint arxiv …, 2024 - arxiv.org
Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

A Havrilla, A Dai, L O'Mahony, K Oostermeijer… - arxiv preprint arxiv …, 2024 - arxiv.org
Synthetic data generation with Large Language Models is a promising paradigm for
augmenting natural data over a nearly infinite range of tasks. Given this variety, direct …

[PDF][PDF] Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models

G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun… - openreview.net
The release of OpenAI's o1 marks a significant milestone in AI, achieving proficiency
comparable to PhD-level expertise in mathematics and coding. While o1 excels at solving …

Improving Language Model Self-Correction Capability with Meta-Feedback

X Li, Y Zhang, L Wang - openreview.net
Large language models (LLMs) are capable of self-correcting their responses by generating
feedback and refining the initial output. However, their performance may sometimes decline …