A minimaximalist approach to reinforcement learning from human feedback
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement
learning from human feedback. Our approach is minimalist in that it does not require training …
learning from human feedback. Our approach is minimalist in that it does not require training …
Bond: Aligning llms with best-of-n distillation
Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in
state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time …
state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time …
Rlhf workflow: From reward modeling to online rlhf
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …
Sharp analysis for kl-regularized contextual bandits and rlhf
Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique
used to enhance policy optimization in reinforcement learning (RL) and reinforcement …
used to enhance policy optimization in reinforcement learning (RL) and reinforcement …
Accelerating Goal-Conditioned RL Algorithms and Research
Self-supervision has the potential to transform reinforcement learning (RL), paralleling the
breakthroughs it has enabled in other areas of machine learning. While self-supervised …
breakthroughs it has enabled in other areas of machine learning. While self-supervised …
Jackpot! Alignment as a Maximal Lottery
Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large
Language Models (LLMs) with human values, is known to fail to satisfy properties that are …
Language Models (LLMs) with human values, is known to fail to satisfy properties that are …
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Preference alignment in Large Language Models (LLMs) has significantly improved their
ability to adhere to human instructions and intentions. However, existing direct alignment …
ability to adhere to human instructions and intentions. However, existing direct alignment …
Learning from Human Feedback: Ranking, Bandit, and Preference Optimization
Y Wu - 2024 - search.proquest.com
This dissertation investigates several challenges in artificial intelligence (AI) alignment and
reinforcement learning (RL), particularly focusing on applications when only preference …
reinforcement learning (RL), particularly focusing on applications when only preference …
[PDF][PDF] ACCELERATING GOAL-CONDITIONED REINFORCE-MENT LEARNING ALGORITHMS AND RESEARCH
M Bortkiewicz, W Pałucki, V Myers, T Dziarmaga… - people.eecs.berkeley.edu
Self-supervision has the potential to transform reinforcement learning (RL), paralleling the
breakthroughs it has enabled in other areas of machine learning. While self-supervised …
breakthroughs it has enabled in other areas of machine learning. While self-supervised …