Towards continual reinforcement learning: A review and perspectives
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …
Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach
We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …
optimal regret in a (near-) stationary environment into another algorithm with optimal …
Corruption-robust exploration in episodic reinforcement learning
We initiate the study of episodic reinforcement learning under adversarial corruptions in both
the rewards and the transition probabilities of the underlying system extending recent results …
the rewards and the transition probabilities of the underlying system extending recent results …
Near-optimal model-free reinforcement learning in non-stationary episodic mdps
We consider model-free reinforcement learning (RL) in non-stationary Markov decision
processes. Both the reward functions and the state transition functions are allowed to vary …
processes. Both the reward functions and the state transition functions are allowed to vary …
Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov
decision processes (CMDPs) with non-stationary objectives and constraints, which plays a …
decision processes (CMDPs) with non-stationary objectives and constraints, which plays a …
Dynamic regret of online markov decision processes
Abstract We investigate online Markov Decision Processes (MDPs) with adversarially
changing loss functions and known transitions. We choose dynamic regret as the …
changing loss functions and known transitions. We choose dynamic regret as the …
Provably efficient model-free algorithms for non-stationary cmdps
We study model-free reinforcement learning (RL) algorithms in episodic non-stationary
constrained Markov decision processes (CMDPs), in which an agent aims to maximize the …
constrained Markov decision processes (CMDPs), in which an agent aims to maximize the …
Non-stationary reinforcement learning under general function approximation
General function approximation is a powerful tool to handle large state and action spaces in
a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding …
a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding …
Performative reinforcement learning
We introduce the framework of performative reinforcement learning where the policy chosen
by the learner affects the underlying reward and transition dynamics of the environment …
by the learner affects the underlying reward and transition dynamics of the environment …
Efficient learning in non-stationary linear markov decision processes
We study episodic reinforcement learning in non-stationary linear (aka low-rank) Markov
Decision Processes (MDPs), ie, both the reward and transition kernel are linear with respect …
Decision Processes (MDPs), ie, both the reward and transition kernel are linear with respect …