A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Dueling posterior sampling for preference-based reinforcement learning

E Novoseller, Y Wei, Y Sui, Y Yue… - … on Uncertainty in …, 2020 - proceedings.mlr.press
In preference-based reinforcement learning (RL), an agent interacts with the environment
while receiving preferences instead of absolute feedback. While there is increasing research …

Learning state importance for preference-based reinforcement learning

G Zhang, H Kashima - Machine Learning, 2024 - Springer
Preference-based reinforcement learning (PbRL) develops agents using human
preferences. Due to its empirical success, it has prospect of benefiting human-centered …

[PDF][PDF] Preference-based reinforcement learning: A preliminary survey

C Wirth, J Fürnkranz - Proceedings of the ECML/PKDD-13 …, 2013 - ke-tud.github.io
Preference-based reinforcement learning has gained significant popularity over the years,
but it is still unclear what exactly preference learning is and how it relates to other …

A Survey on Human Preference Learning for Large Language Models

R Jiang, K Chen, X Bai, Z He, J Li, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent surge of versatile large language models (LLMs) largely depends on aligning
increasingly capable foundation models with human intentions by preference learning …

Task transfer by preference-based cost learning

M **g, X Ma, W Huang, F Sun, H Liu - … of the AAAI Conference on Artificial …, 2019 - aaai.org
The goal of task transfer in reinforcement learning is migrating the action policy of an agent
to the target task from the source task. Given their successes on robotic action planning …

EPMC: Every visit preference Monte Carlo for reinforcement learning

C Wirth, J Fürnkranz - Asian Conference on Machine …, 2013 - proceedings.mlr.press
Reinforcement learning algorithms are usually hard to use for non expert users. It is required
to consider several aspects like the definition of state-, action-and reward-space as well as …

Adapting Robotic Systems to User Control

U Biswas - 2023 - search.proquest.com
In this work, I propose to bridge the gap between human users and adaptive control of
robotic systems. The goal is to enable robots to consider user feedback and adjust their …

What Do You Want Me to Do? Addressing Model Differences for Human-Aware Decision-Making from a Learning Perspective

Z Gong - 2022 - search.proquest.com
As intelligent agents become pervasive in our lives, they are expected to not only achieve
tasks alone but also engage in tasks with humans in the loop. In such cases, the human …