Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[PDF][PDF] Policy learning with constraints in model-free reinforcement learning: A survey
Reinforcement Learning (RL) algorithms have had tremendous success in simulated
domains. These algorithms, however, often cannot be directly applied to physical systems …
domains. These algorithms, however, often cannot be directly applied to physical systems …
Natural policy gradient primal-dual method for constrained markov decision processes
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …
expected total reward while satisfying a constraint on the expected total utility. We employ …
Provably efficient safe exploration via primal-dual policy optimization
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …
processes in which an agent aims to maximize the expected total reward subject to a safety …
Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability
A trustworthy reinforcement learning algorithm should be competent in solving challenging
real-world problems, including {robustly} handling uncertainties, satisfying {safety} …
real-world problems, including {robustly} handling uncertainties, satisfying {safety} …
Long-term fairness with unknown dynamics
While machine learning can myopically reinforce social inequalities, it may also be used to
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …
dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an …
Provably efficient model-free constrained rl with linear function approximation
We study the constrained reinforcement learning problem, in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …
maximize the expected cumulative reward subject to a constraint on the expected total value …
DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …
unknown environment, it must do so while ensuring no safety constraint violations. We …
Constrained episodic reinforcement learning in concave-convex and knapsack settings
We propose an algorithm for tabular episodic reinforcement learning with constraints. We
provide a modular analysis with strong theoretical guarantees for settings with concave …
provide a modular analysis with strong theoretical guarantees for settings with concave …
A simple reward-free approach to constrained reinforcement learning
In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the
overall reward but also satisfy the additional safety, diversity, or budget constraints …
overall reward but also satisfy the additional safety, diversity, or budget constraints …
Towards achieving sub-linear regret and hard constraint violation in model-free rl
We study the constrained Markov decision processes (CMDPs), in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …
maximize the expected cumulative reward subject to a constraint on the expected total value …