Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
Provable benefits of actor-critic methods for offline reinforcement learning
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Nearly minimax optimal reinforcement learning for linear markov decision processes
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
Learning near optimal policies with low inherent bellman error
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
Leveraging offline data in online reinforcement learning
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …
making problems has been much discussed, but often ignored in this discussion is the …
Pc-pg: Policy cover directed exploration for provable policy gradient learning
Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …
variety of reasons: they are model free, they directly optimize the performance metric of …
Reward-free rl is no harder than reward-aware rl in linear markov decision processes
Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …
have access to a reward function during exploration, but must propose a near-optimal policy …
Optimism in reinforcement learning with generalized linear function approximation
We design a new provably efficient algorithm for episodic reinforcement learning with
generalized linear function approximation. We analyze the algorithm under a new …
generalized linear function approximation. We analyze the algorithm under a new …
Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium
In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …