The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
The efficacy of pessimism in asynchronous Q-learning
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
Settling the sample complexity of online reinforcement learning
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …
While a number of recent works achieved asymptotically minimal regret in online RL, the …
When is agnostic reinforcement learning statistically tractable?
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …
Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes
access to both an offline dataset and online interactions with the unknown environment. A …
access to both an offline dataset and online interactions with the unknown environment. A …
Optimal treatment allocation for efficient policy evaluation in sequential decision making
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …
newly developed products against standard baselines. This paper studies optimal designs …
[HTML][HTML] Improved exploration–exploitation trade-off through adaptive prioritized experience replay
Experience replay is an indispensable part of deep reinforcement learning algorithms that
enables the agent to revisit and reuse its past and recent experiences to update the network …
enables the agent to revisit and reuse its past and recent experiences to update the network …
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?
Inverse Reinforcement Learning (IRL)---the problem of learning reward functions from
demonstrations of an\emph {expert policy}---plays a critical role in develo** intelligent …
demonstrations of an\emph {expert policy}---plays a critical role in develo** intelligent …
Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity
constraints--a new problem motivated by real-world applications where deployments of new …
constraints--a new problem motivated by real-world applications where deployments of new …