Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A review of safe reinforcement learning: Methods, theory and applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …
making tasks. However, safety concerns are raised during deploying RL in real-world …
A review of safe reinforcement learning: Methods, theories and applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …
making tasks. However, safety concerns are raised during deploying RL in real-world …
Last-iterate convergent policy gradient primal-dual methods for constrained mdps
We study the problem of computing an optimal policy of an infinite-horizon discounted
constrained Markov decision process (constrained MDP). Despite the popularity of …
constrained Markov decision process (constrained MDP). Despite the popularity of …
Long-term fairness with unknown dynamics
While machine learning can myopically reinforce social inequalities, it may also be used to
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …
Reload: Reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained mdps
In recent years, reinforcement learning (RL) has been applied to real-world problems with
increasing success. Such applications often require to put constraints on the agent's …
increasing success. Such applications often require to put constraints on the agent's …
Provably efficient model-free constrained rl with linear function approximation
We study the constrained reinforcement learning problem, in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …
maximize the expected cumulative reward subject to a constraint on the expected total value …
DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …
unknown environment, it must do so while ensuring no safety constraint violations. We …
On kernelized multi-armed bandits with constraints
We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …
general unknown constraint function. Both functions can be non-linear (even non-convex) …
Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities
We investigate safe multi-agent reinforcement learning, where agents seek to collectively
maximize an aggregate sum of local objectives while satisfying their own safety constraints …
maximize an aggregate sum of local objectives while satisfying their own safety constraints …
Learning infinite-horizon average-reward Markov decision process with constraints
We study regret minimization for infinite-horizon average-reward Markov Decision
Processes (MDPs) under cost constraints. We start by designing a policy optimization …
Processes (MDPs) under cost constraints. We start by designing a policy optimization …