Google Académico

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Guardar Citar Citado por 443 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Guardar Citar Citado por 4730 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] acm.org

No, to the right: Online language corrections for robotic manipulation via shared autonomy

Y Cui, S Karamcheti, R Palleti, N Shivakumar… - Proceedings of the …, 2023 - dl.acm.org

Systems for language-guided human-robot interaction must satisfy two key desiderata for
broad adoption: adaptivity and learning efficiency. Unfortunately, existing instruction …

Guardar Citar Citado por 73 Artículos relacionados Las 8 versiones

[Free GPT-4]

[PDF] arxiv.org

A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning

X Di, R Shi - Transportation research part C: emerging technologies, 2021 - Elsevier

This paper serves as an introduction and overview of the potentially useful models and
methodologies from artificial intelligence (AI) into the field of transportation engineering for …

Guardar Citar Citado por 226 Artículos relacionados Las 8 versiones

[Free GPT-4]

[PDF] nowpublishers.com

Interactive imitation learning in robotics: A survey

C Celemin, R Pérez-Dattari, E Chisari… - … and Trends® in …, 2022 - nowpublishers.com

Interactive Imitation Learning in Robotics: A Survey Page 1 Interactive Imitation Learning in
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …

Guardar Citar Citado por 53 Artículos relacionados Las 8 versiones Búsqueda de bibliotecas Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Guardar Citar Citado por 119 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

Guardar Citar Citado por 56 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

B-pref: Benchmarking preference-based reinforcement learning

K Lee, L Smith, A Dragan, P Abbeel - arxiv preprint arxiv:2111.03026, 2021 - arxiv.org

Reinforcement learning (RL) requires access to a reward function that incentivizes the right
behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL …

Guardar Citar Citado por 125 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Recent advances in leveraging human guidance for sequential decision-making tasks

R Zhang, F Torabi, G Warnell, P Stone - Autonomous Agents and Multi …, 2021 - Springer

A longstanding goal of artificial intelligence is to create artificial agents capable of learning
to perform tasks that require sequential decision making. Importantly, while it is the artificial …

Guardar Citar Citado por 36 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] arxiv.org

SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning

J Park, Y Seo, J Shin, H Lee, P Abbeel… - arxiv preprint arxiv …, 2022 - arxiv.org

Preference-based reinforcement learning (RL) has shown potential for teaching agents to
perform the target tasks without a costly, pre-defined reward function by learning the reward …

Guardar Citar Citado por 96 Artículos relacionados Las 6 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Batch active preference-based learning of reward functions

Open problems and fundamental limitations of reinforcement learning from human feedback

On the opportunities and risks of foundation models

No, to the right: Online language corrections for robotic manipulation via shared autonomy

A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning

Interactive imitation learning in robotics: A survey

A survey of reinforcement learning from human feedback

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

B-pref: Benchmarking preference-based reinforcement learning

Recent advances in leveraging human guidance for sequential decision-making tasks

SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning