Google Acadèmic

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

Desa Cita Citat per 441 Articles relacionats Totes les 10 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Desa Cita Citat per 121 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Dueling posterior sampling for preference-based reinforcement learning

E Novoseller, Y Wei, Y Sui, Y Yue… - … on Uncertainty in …, 2020 - proceedings.mlr.press

In preference-based reinforcement learning (RL), an agent interacts with the environment
while receiving preferences instead of absolute feedback. While there is increasing research …

Desa Cita Citat per 66 Articles relacionats Totes les 14 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Learning state importance for preference-based reinforcement learning

G Zhang, H Kashima - Machine Learning, 2024 - Springer

Preference-based reinforcement learning (PbRL) develops agents using human
preferences. Due to its empirical success, it has prospect of benefiting human-centered …

Desa Cita Citat per 11 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Preference-based reinforcement learning: A preliminary survey

C Wirth, J Fürnkranz - Proceedings of the ECML/PKDD-13 …, 2013 - ke-tud.github.io

Preference-based reinforcement learning has gained significant popularity over the years,
but it is still unclear what exactly preference learning is and how it relates to other …

Desa Cita Citat per 27 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey on Human Preference Learning for Large Language Models

R Jiang, K Chen, X Bai, Z He, J Li, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge of versatile large language models (LLMs) largely depends on aligning
increasingly capable foundation models with human intentions by preference learning …

Desa Cita Citat per 12 Articles relacionats Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Task transfer by preference-based cost learning

M **g, X Ma, W Huang, F Sun, H Liu - … of the AAAI Conference on Artificial …, 2019 - aaai.org

The goal of task transfer in reinforcement learning is migrating the action policy of an agent
to the target task from the source task. Given their successes on robotic action planning …

Desa Cita Citat per 13 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

EPMC: Every visit preference Monte Carlo for reinforcement learning

C Wirth, J Fürnkranz - Asian Conference on Machine …, 2013 - proceedings.mlr.press

Reinforcement learning algorithms are usually hard to use for non expert users. It is required
to consider several aspects like the definition of state-, action-and reward-space as well as …

Desa Cita Citat per 18 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

Adapting Robotic Systems to User Control

U Biswas - 2023 - search.proquest.com

In this work, I propose to bridge the gap between human users and adaptive control of
robotic systems. The goal is to enable robots to consider user feedback and adjust their …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek

What Do You Want Me to Do? Addressing Model Differences for Human-Aware Decision-Making from a Learning Perspective

Z Gong - 2022 - search.proquest.com

As intelligent agents become pervasive in our lives, they are expected to not only achieve
tasks alone but also engage in tasks with humans in the loop. In such cases, the human …

Desa Cita Citat per 1 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

A policy iteration algorithm for learning from preference-based feedback

A survey of preference-based reinforcement learning methods

A survey of reinforcement learning from human feedback

Dueling posterior sampling for preference-based reinforcement learning

Learning state importance for preference-based reinforcement learning

[PDF][PDF] Preference-based reinforcement learning: A preliminary survey

A Survey on Human Preference Learning for Large Language Models

Task transfer by preference-based cost learning

EPMC: Every visit preference Monte Carlo for reinforcement learning

Adapting Robotic Systems to User Control

What Do You Want Me to Do? Addressing Model Differences for Human-Aware Decision-Making from a Learning Perspective