Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020‏ - proceedings.neurips.cc
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

Rambo-rl: Robust adversarial model-based offline reinforcement learning

M Rigter, B Lacerda, N Hawes - Advances in neural …, 2022‏ - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …

Robust reinforcement learning using offline data

K Panaganti, Z Xu, D Kalathil… - Advances in neural …, 2022‏ - proceedings.neurips.cc
The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the
uncertainty in model parameters. Parameter uncertainty commonly occurs in many real …

Emergent complexity and zero-shot transfer via unsupervised environment design

M Dennis, N Jaques, E Vinitsky… - Advances in neural …, 2020‏ - proceedings.neurips.cc
A wide range of reinforcement learning (RL) problems---including robustness, transfer
learning, unsupervised RL, and emergent complexity---require specifying a distribution of …

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - Machine Learning, 2021‏ - Springer
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

Adversarial policies: Attacking deep reinforcement learning

A Gleave, M Dennis, C Wild, N Kant, S Levine… - arxiv preprint arxiv …, 2019‏ - arxiv.org
Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial
perturbations to their observations, similar to adversarial examples for classifiers. However …

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arxiv preprint arxiv:2107.06226, 2021‏ - arxiv.org
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022‏ - proceedings.mlr.press
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024‏ - proceedings.neurips.cc
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021‏ - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …