[PDF][PDF] A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - researchgate.net
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Reward-rational (implicit) choice: A unifying formalism for reward learning

HJ Jeon, S Milli, A Dragan - Advances in Neural …, 2020 - proceedings.neurips.cc
It is often difficult to hand-specify what the correct reward function is for a task, so
researchers have instead aimed to learn reward functions from human behavior or …

A survey on interactive reinforcement learning: Design principles and open challenges

C Arzate Cruz, T Igarashi - Proceedings of the 2020 ACM designing …, 2020 - dl.acm.org
Interactive reinforcement learning (RL) has been successfully used in various applications in
different fields, which has also motivated HCI researchers to contribute in this area. In this …

Scalable bayesian inverse reinforcement learning

AJ Chan, M van der Schaar - arxiv preprint arxiv:2102.06483, 2021 - arxiv.org
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …

Learning human objectives by evaluating hypothetical behavior

S Reddy, A Dragan, S Levine… - … on machine learning, 2020 - proceedings.mlr.press
We seek to align agent behavior with a user's objectives in a reinforcement learning setting
with unknown dynamics, an unknown reward function, and unknown unsafe states. The user …

Direct behavior specification via constrained reinforcement learning

J Roy, R Girgis, J Romoff, PL Bacon, C Pal - arxiv preprint arxiv …, 2021 - arxiv.org
The standard formulation of Reinforcement Learning lacks a practical way of specifying what
are admissible and forbidden behaviors. Most often, practitioners go about the task of …

Validating metrics for reward alignment in human-autonomy teaming

L Sanneman, JA Shah - Computers in Human Behavior, 2023 - Elsevier
Alignment of human and autonomous agent values and objectives is vital in human-
autonomy teaming settings which require collaborative action toward a common goal. In …

A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy

Z Li - arxiv preprint arxiv:2405.10214, 2024 - arxiv.org
Driven by the algorithmic advancements in reinforcement learning and the increasing
number of implementations of human-AI collaboration, Collaborative Reinforcement …

Active reward learning from multiple teachers

P Barnett, R Freedman, J Svegliato… - arxiv preprint arxiv …, 2023 - arxiv.org
Reward learning algorithms utilize human feedback to infer a reward function, which is then
used to train an AI system. This human feedback is often a preference comparison, in which …

Transparent value alignment

L Sanneman, J Shah - Companion of the 2023 ACM/IEEE International …, 2023 - dl.acm.org
As robots become increasingly prevalent in our communities, aligning the values motivating
their behavior with human values is critical. However, it is often difficult or impossible for …