- Academic Search

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Save Cite Cited by 115 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Reinforcement Learning: An Overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Personalizing reinforcement learning from human feedback with variational preference learning

S Poddar, Y Wan, H Ivison, A Gupta… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

Save Cite Cited by 8 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] springer.com

Beyond preferences in ai alignment

T Zhi-Xuan, M Carroll, M Franklin, H Ashton - Philosophical Studies, 2024 - Springer

The dominant practice of AI alignment assumes (1) that preferences are an adequate
representation of human values,(2) that human rationality can be understood in terms of …

Save Cite Cited by 7 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Self-consuming generative models with curated data provably optimize human preferences

D Ferbach, Q Bertrand, AJ Bose, G Gidel - arxiv preprint arxiv:2407.09499, 2024 - arxiv.org

The rapid progress in generative models has resulted in impressive leaps in generation
quality, blurring the lines between synthetic and real data. Web-scale datasets are now …

Save Cite Cited by 5 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Improving context-aware preference modeling for language models

S Pitis, Z **ao, NL Roux, A Sordoni - arxiv preprint arxiv:2407.14916, 2024 - arxiv.org

While finetuning language models from pairwise preferences has proven remarkably
effective, the underspecified nature of natural language presents critical challenges. Direct …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On extending direct preference optimization to accommodate ties

J Chen, G Yang, W Lin, J Mei, B Byrne - arxiv preprint arxiv:2409.17431, 2024 - arxiv.org

We derive and investigate two DPO variants that explicitly model the possibility of declaring
a tie in pair-wise comparisons. We replace the Bradley-Terry model in DPO with two well …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

[PDF][PDF] Elo Ratings in the Presence of Intransitivity

AH Hamilton, M Roughan, A Kalenkova - arxiv preprint arxiv:2412.14427, 2024 - arxiv.org

Elo Ratings in the Presense of Intransitivity Page 1 Electronic Journal of Statistics ISSN:
1935-7524 Elo Ratings in the Presense of Intransitivity Adam H. Hamilton1 , Anna …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

A density estimation perspective on learning from pairwise human preferences

A survey of reinforcement learning from human feedback

Reinforcement Learning: An Overview

Personalizing reinforcement learning from human feedback with variational preference learning

Beyond preferences in ai alignment

Self-consuming generative models with curated data provably optimize human preferences

Improving context-aware preference modeling for language models

On extending direct preference optimization to accommodate ties

[PDF][PDF] Elo Ratings in the Presence of Intransitivity