The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

When is agnostic reinforcement learning statistically tractable?

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning

G Li, W Zhan, JD Lee, Y Chi… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes
access to both an offline dataset and online interactions with the unknown environment. A …

Optimal treatment allocation for efficient policy evaluation in sequential decision making

T Li, C Shi, J Wang, F Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …

[HTML][HTML] Improved exploration–exploitation trade-off through adaptive prioritized experience replay

H Hassani, S Nikan, A Shami - Neurocomputing, 2025 - Elsevier
Experience replay is an indispensable part of deep reinforcement learning algorithms that
enables the agent to revisit and reuse its past and recent experiences to update the network …

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?

L Zhao, M Wang, Y Bai - 2023 - openreview.net
Inverse Reinforcement Learning (IRL)---the problem of learning reward functions from
demonstrations of an\emph {expert policy}---plays a critical role in develo** intelligent …

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

D Qiao, YX Wang - arxiv preprint arxiv:2402.01111, 2024 - arxiv.org
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity
constraints--a new problem motivated by real-world applications where deployments of new …