A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

A review of safe reinforcement learning: Methods, theories and applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y **, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc
Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

Nearly minimax optimal reinforcement learning with linear function approximation

P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press
We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature map** $\boldsymbol …

Made: Exploration via maximizing deviation from explored regions

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc
In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …