A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

A review of safe reinforcement learning: Methods, theories and applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning

W Meng, Q Zheng, Y Shi, G Pan - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org
In deep reinforcement learning, off-policy data help reduce on-policy interaction with the
environment, and the trust region policy optimization (TRPO) method is efficient to stabilize …

[HTML][HTML] Qualitative case-based reasoning and learning

TPD Homem, PE Santos, AHR Costa… - Artificial Intelligence, 2020 - Elsevier
The development of autonomous agents that perform tasks with the same dexterity as
performed by humans is one of the challenges of artificial intelligence and robotics. This …

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Policy optimization with stochastic mirror descent

L Yang, Y Zhang, G Zheng, Q Zheng, P Li… - Proceedings of the …, 2022 - ojs.aaai.org
Improving sample efficiency has been a longstanding goal in reinforcement learning. This
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …

Cup: A conservative update policy algorithm for safe reinforcement learning

L Yang, J Ji, J Dai, Y Zhang, P Li, G Pan - arxiv preprint arxiv:2202.07565, 2022 - arxiv.org
Safe reinforcement learning (RL) is still very challenging since it requires the agent to
consider both return maximization and safe exploration. In this paper, we propose CUP, a …

Sample complexity of policy gradient finding second-order stationary points

L Yang, Q Zheng, G Pan - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org
The policy-based reinforcement learning (RL) can be considered as maximization of its
objective. However, due to the inherent non-concavity of its objective, the policy gradient …

Variance aware reward smoothing for deep reinforcement learning

Y Dong, S Zhang, X Liu, Y Zhang, T Shen - Neurocomputing, 2021 - Elsevier
Abstract A Reinforcement Learning (RL) agent interacts with the environment to learn a
policy with high accumulated rewards through attempts and failures. However, RL suffers …