[PDF][PDF] Policy learning with constraints in model-free reinforcement learning: A survey

Y Liu, A Halev, X Liu - The 30th international joint conference on artificial …, 2021 - par.nsf.gov
Reinforcement Learning (RL) algorithms have had tremendous success in simulated
domains. These algorithms, however, often cannot be directly applied to physical systems …

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability

M Xu, Z Liu, P Huang, W Ding, Z Cen, B Li… - arxiv preprint arxiv …, 2022 - arxiv.org
A trustworthy reinforcement learning algorithm should be competent in solving challenging
real-world problems, including {robustly} handling uncertainties, satisfying {safety} …

Long-term fairness with unknown dynamics

T Yin, R Raab, M Liu, Y Liu - Advances in Neural …, 2023 - proceedings.neurips.cc
While machine learning can myopically reinforce social inequalities, it may also be used to
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …

Provably efficient model-free constrained rl with linear function approximation

A Ghosh, X Zhou, N Shroff - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the constrained reinforcement learning problem, in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …

DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning

A Bura, A HasanzadeZonuzy… - Advances in neural …, 2022 - proceedings.neurips.cc
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …

Constrained episodic reinforcement learning in concave-convex and knapsack settings

K Brantley, M Dudik, T Lykouris… - Advances in …, 2020 - proceedings.neurips.cc
We propose an algorithm for tabular episodic reinforcement learning with constraints. We
provide a modular analysis with strong theoretical guarantees for settings with concave …

A simple reward-free approach to constrained reinforcement learning

S Miryoosefi, C ** - International Conference on Machine …, 2022 - proceedings.mlr.press
In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the
overall reward but also satisfy the additional safety, diversity, or budget constraints …

Towards achieving sub-linear regret and hard constraint violation in model-free rl

A Ghosh, X Zhou, N Shroff - International Conference on …, 2024 - proceedings.mlr.press
We study the constrained Markov decision processes (CMDPs), in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …