- Academic Search

S Welleck, A Bertsch, M Finlayson… - arxiv preprint arxiv …, 2024 - arxiv.org

One of the most striking findings in modern research on large language models (LLMs) is
that scaling up compute during training leads to better results. However, less attention has …

Salva Cita Citato da 14 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Z Zeng, Q Cheng, Z Yin, B Wang, S Li, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-
level performances on many challanging tasks that require strong reasoning ability. OpenAI …

Salva Cita Citato da 2 Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Aligning large language models via self-steering optimization

H **ang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

Salva Cita Citato da 1 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

A Havrilla, A Dai, L O'Mahony, K Oostermeijer… - arxiv preprint arxiv …, 2024 - arxiv.org

Synthetic data generation with Large Language Models is a promising paradigm for
augmenting natural data over a nearly infinite range of tasks. Given this variety, direct …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] openreview.net

[PDF][PDF] Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models

G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun… - openreview.net

The release of OpenAI's o1 marks a significant milestone in AI, achieving proficiency
comparable to PhD-level expertise in mathematics and coding. While o1 excels at solving …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] openreview.net

Improving Language Model Self-Correction Capability with Meta-Feedback

X Li, Y Zhang, L Wang - openreview.net

Large language models (LLMs) are capable of self-correcting their responses by generating
feedback and refining the initial output. However, their performance may sometimes decline …

Salva Cita Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Training language models to self-correct via reinforcement learning, 2024

From decoding to meta-generation: Inference-time algorithms for large language models

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Aligning large language models via self-steering optimization

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

[PDF][PDF] Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models

Improving Language Model Self-Correction Capability with Meta-Feedback