DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning

A Bura, A HasanzadeZonuzy… - Advances in neural …, 2022 - proceedings.neurips.cc
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …

A CMDP-within-online framework for meta-safe reinforcement learning

V Khattar, Y Ding, B Sel, J Lavaei, M ** - arxiv preprint arxiv:2405.16601, 2024 - arxiv.org
Meta-reinforcement learning has widely been used as a learning-to-learn framework to
solve unseen tasks with limited experience. However, the aspect of constraint violations has …

Constrained reinforcement learning under model mismatch

Z Sun, S He, F Miao, S Zou - arxiv preprint arxiv:2405.01327, 2024 - arxiv.org
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing
policy in the training environment. However, when deployed in a real environment, it may …

Robust constrained reinforcement learning

Y Wang, F Miao, S Zou - arxiv preprint arxiv:2209.06866, 2022 - arxiv.org
Constrained reinforcement learning is to maximize the expected reward subject to
constraints on utilities/costs. However, the training environment may not be the same as the …

Policy-based primal-dual methods for convex constrained markov decision processes

D Ying, MA Guo, Y Ding, J Lavaei… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract We study convex Constrained Markov Decision Processes (CMDPs) in which the
objective is concave and the constraints are convex in the state-action occupancy measure …

Provably efficient generalized lagrangian policy optimization for safe multi-agent reinforcement learning

D Ding, X Wei, Z Yang, Z Wang… - Learning for dynamics …, 2023 - proceedings.mlr.press
We examine online safe multi-agent reinforcement learning using constrained Markov
games in which agents compete by maximizing their expected total rewards under a …

Algorithm for constrained markov decision process with linear convergence

E Gladin, M Lavrik-Karmazin… - International …, 2023 - proceedings.mlr.press
The problem of constrained Markov decision process is considered. An agent aims to
maximize the expected accumulated discounted reward subject to multiple constraints on its …

Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach

Q Bai, AS Bedi, M Agarwal, A Koppel… - Journal of Artificial …, 2023 - jair.org
Reinforcement learning (RL) is widely used in applications where one needs to perform
sequential decision-making while interacting with the environment. The standard RL …

Policy gradient primal-dual mirror descent for constrained MDPs with large state spaces

D Ding, MR Jovanović - … IEEE 61st Conference on Decision and …, 2022 - ieeexplore.ieee.org
We study constrained sequential decision-making problems modeled by constrained
Markov decision processes with potentially infinite state spaces. We propose a Bregman …

Обзор выпуклой оптимизации марковских процессов принятия решений

ВД Руденко, НЕ Юдин, АА Васин - Компьютерные исследования и …, 2023 - mathnet.ru
В данной статье проведен обзор как исторических достижений, так и современных
результатов в области марковских процессов принятия решений (Markov Decision …