From decoding to meta-generation: Inference-time algorithms for large language models
One of the most striking findings in modern research on large language models (LLMs) is
that scaling up compute during training leads to better results. However, less attention has …
that scaling up compute during training leads to better results. However, less attention has …
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …
level performances on many challanging tasks that require strong reasoning ability. OpenAI …
Aligning large language models via self-steering optimization
Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …
key to automated alignment lies in providing learnable and accurate preference signals for …
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Synthetic data generation with Large Language Models is a promising paradigm for
augmenting natural data over a nearly infinite range of tasks. Given this variety, direct …
augmenting natural data over a nearly infinite range of tasks. Given this variety, direct …
[PDF][PDF] Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models
The release of OpenAI's o1 marks a significant milestone in AI, achieving proficiency
comparable to PhD-level expertise in mathematics and coding. While o1 excels at solving …
comparable to PhD-level expertise in mathematics and coding. While o1 excels at solving …
Improving Language Model Self-Correction Capability with Meta-Feedback
X Li, Y Zhang, L Wang - openreview.net
Large language models (LLMs) are capable of self-correcting their responses by generating
feedback and refining the initial output. However, their performance may sometimes decline …
feedback and refining the initial output. However, their performance may sometimes decline …