A review of uncertainty for deep reinforcement learning

O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

Rvs: What is essential for offline rl via supervised learning?

S Emmons, B Eysenbach, I Kostrikov… - ar** for uncertainty-driven offline reinforcement learning
C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

Reward model ensembles help mitigate overoptimization

T Coste, U Anwar, R Kirk, D Krueger - arxiv preprint arxiv:2310.02743, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning
large language models to follow instructions. As part of this process, learned reward models …

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W **ong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net
This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Conformal prediction for uncertainty-aware planning with diffusion dynamics model

J Sun, Y Jiang, J Qiu, P Nobel… - Advances in …, 2024 - proceedings.neurips.cc
Robotic applications often involve working in environments that are uncertain, dynamic, and
partially observable. Recently, diffusion models have been proposed for learning trajectory …