Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …
a fixed dataset without active data collection. Based on the composition of the offline dataset …
[KİTAP][B] Control systems and reinforcement learning
S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
Jump-start reinforcement learning
Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …
Provable benefits of actor-critic methods for offline reinforcement learning
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Natural policy gradient primal-dual method for constrained markov decision processes
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …
expected total reward while satisfying a constraint on the expected total utility. We employ …
On the convergence rates of policy gradient methods
L **ao - Journal of Machine Learning Research, 2022 - jmlr.org
We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …
action spaces and study the convergence rates of the projected policy gradient method and …
Policy gradient method for robust reinforcement learning
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …
complexity analysis for robust reinforcement learning under model mismatch. Robust …
Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …
the development of their theoretical foundations. Despite the huge efforts directed at the …
Provably efficient safe exploration via primal-dual policy optimization
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …
processes in which an agent aims to maximize the expected total reward subject to a safety …