A review of uncertainty for deep reinforcement learning

O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward

HC Croll, K Ikuma, SK Ong, S Sarkar - Critical Reviews in …, 2023 - Taylor & Francis
Wastewater treatment process control optimization is a complex task in a highly nonlinear
environment. Reinforcement learning (RL) is a machine learning technique that stands out …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Reinforcement learning with human feedback: Learning dynamic choices via pessimism

Z Li, Z Yang, M Wang - arxiv preprint arxiv:2305.18438, 2023 - arxiv.org
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …

Model-Bellman inconsistency for model-based offline reinforcement learning

Y Sun, J Zhang, C Jia, H Lin, J Ye… - … Conference on Machine …, 2023 - proceedings.mlr.press
For offline reinforcement learning (RL), model-based methods are expected to be data-
efficient as they incorporate dynamics models to generate more data. However, due to …

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …