Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Interactive imitation learning in robotics: A survey

C Celemin, R Pérez-Dattari, E Chisari… - … and Trends® in …, 2022 - nowpublishers.com
Interactive Imitation Learning in Robotics: A Survey Page 1 Interactive Imitation Learning in
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - ar** social choice theory to RLHF
J Dai, E Fleisig - arxiv preprint arxiv:2404.13038, 2024 - arxiv.org
Recent work on the limitations of using reinforcement learning from human feedback (RLHF)
to incorporate human preferences into model behavior often raises social choice theory as a …

Crowd-PrefRL: Preference-based reward learning from crowds

D Chhan, E Novoseller, VJ Lawhern - arxiv preprint arxiv:2401.10941, 2024 - arxiv.org
Preference-based reinforcement learning (RL) provides a framework to train agents using
human feedback through pairwise preferences over pairs of behaviors, enabling agents to …