Conservative q-learning for offline reinforcement learning
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …
Rambo-rl: Robust adversarial model-based offline reinforcement learning
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …
further environment interaction. Model-based algorithms, which learn a model of the …
Robust reinforcement learning using offline data
The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the
uncertainty in model parameters. Parameter uncertainty commonly occurs in many real …
uncertainty in model parameters. Parameter uncertainty commonly occurs in many real …
Emergent complexity and zero-shot transfer via unsupervised environment design
A wide range of reinforcement learning (RL) problems---including robustness, transfer
learning, unsupervised RL, and emergent complexity---require specifying a distribution of …
learning, unsupervised RL, and emergent complexity---require specifying a distribution of …
Challenges of real-world reinforcement learning: definitions, benchmarks and analysis
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …
beginning to show some successes in real-world scenarios. However, much of the research …
Adversarial policies: Attacking deep reinforcement learning
Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial
perturbations to their observations, similar to adversarial examples for classifiers. However …
perturbations to their observations, similar to adversarial examples for classifiers. However …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
Policy gradient method for robust reinforcement learning
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …
complexity analysis for robust reinforcement learning under model mismatch. Robust …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …