- Academic Search

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Gem Citer Citeret af 301 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] kcl.ac.uk

A review of safe reinforcement learning: Methods, theories and applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Gem Citer Citeret af 20 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Last-iterate convergent policy gradient primal-dual methods for constrained mdps

D Ding, CY Wei, K Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc

We study the problem of computing an optimal policy of an infinite-horizon discounted
constrained Markov decision process (constrained MDP). Despite the popularity of …

Gem Citer Citeret af 27 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Long-term fairness with unknown dynamics

T Yin, R Raab, M Liu, Y Liu - Advances in Neural …, 2023 - proceedings.neurips.cc

While machine learning can myopically reinforce social inequalities, it may also be used to
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …

Gem Citer Citeret af 24 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Reload: Reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained mdps

T Moskovitz, B O'Donoghue, V Veeriah… - International …, 2023 - proceedings.mlr.press

In recent years, reinforcement learning (RL) has been applied to real-world problems with
increasing success. Such applications often require to put constraints on the agent's …

Gem Citer Citeret af 22 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Provably efficient model-free constrained rl with linear function approximation

A Ghosh, X Zhou, N Shroff - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the constrained reinforcement learning problem, in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …

Gem Citer Citeret af 36 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning

A Bura, A HasanzadeZonuzy… - Advances in neural …, 2022 - proceedings.neurips.cc

Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …

Gem Citer Citeret af 39 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc

We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

Gem Citer Citeret af 34 Relaterede artikler Alle 10 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities

D Ying, Y Zhang, Y Ding, A Koppel… - Advances in Neural …, 2023 - proceedings.neurips.cc

We investigate safe multi-agent reinforcement learning, where agents seek to collectively
maximize an aggregate sum of local objectives while satisfying their own safety constraints …

Gem Citer Citeret af 13 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Learning infinite-horizon average-reward Markov decision process with constraints

L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press

We study regret minimization for infinite-horizon average-reward Markov Decision
Processes (MDPs) under cost constraints. We start by designing a policy optimization …

Gem Citer Citeret af 34 Relaterede artikler Alle 5 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Learning policies with zero or bounded constraint violation for constrained mdps

A review of safe reinforcement learning: Methods, theory and applications

A review of safe reinforcement learning: Methods, theories and applications

Last-iterate convergent policy gradient primal-dual methods for constrained mdps

Long-term fairness with unknown dynamics

Reload: Reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained mdps

Provably efficient model-free constrained rl with linear function approximation

DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning

On kernelized multi-armed bandits with constraints

Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities

Learning infinite-horizon average-reward Markov decision process with constraints