Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
Interactive imitation learning in robotics: A survey
Interactive Imitation Learning in Robotics: A Survey Page 1 Interactive Imitation Learning in
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …
A survey of reinforcement learning from human feedback
T Kaufmann, P Weng, V Bengs… - ar** social choice theory to RLHF
Recent work on the limitations of using reinforcement learning from human feedback (RLHF)
to incorporate human preferences into model behavior often raises social choice theory as a …
to incorporate human preferences into model behavior often raises social choice theory as a …
Crowd-PrefRL: Preference-based reward learning from crowds
Preference-based reinforcement learning (RL) provides a framework to train agents using
human feedback through pairwise preferences over pairs of behaviors, enabling agents to …
human feedback through pairwise preferences over pairs of behaviors, enabling agents to …