Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T **: Understanding the benefits of reward engineering on sample complexity
A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022‏ - proceedings.neurips.cc
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024‏ - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021‏ - proceedings.mlr.press
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020‏ - proceedings.neurips.cc
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

Deployment-efficient reinforcement learning via model-based offline optimization

T Matsushima, H Furuta, Y Matsuo, O Nachum… - arxiv preprint arxiv …, 2020‏ - arxiv.org
Most reinforcement learning (RL) algorithms assume online access to the environment, in
which one may readily interleave updates to the policy with experience collection using that …

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021‏ - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

Understanding domain randomization for sim-to-real transfer

X Chen, J Hu, C **, L Li, L Wang - arxiv preprint arxiv:2110.03239, 2021‏ - arxiv.org
Reinforcement learning encounters many challenges when applied directly in the real world.
Sim-to-real transfer is widely used to transfer the knowledge learned from simulation to the …