Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …
unknown environment, it must do so while ensuring no safety constraint violations. We …
A CMDP-within-online framework for meta-safe reinforcement learning
Meta-reinforcement learning has widely been used as a learning-to-learn framework to
solve unseen tasks with limited experience. However, the aspect of constraint violations has …
solve unseen tasks with limited experience. However, the aspect of constraint violations has …
Constrained reinforcement learning under model mismatch
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing
policy in the training environment. However, when deployed in a real environment, it may …
policy in the training environment. However, when deployed in a real environment, it may …
Robust constrained reinforcement learning
Constrained reinforcement learning is to maximize the expected reward subject to
constraints on utilities/costs. However, the training environment may not be the same as the …
constraints on utilities/costs. However, the training environment may not be the same as the …
Policy-based primal-dual methods for convex constrained markov decision processes
Abstract We study convex Constrained Markov Decision Processes (CMDPs) in which the
objective is concave and the constraints are convex in the state-action occupancy measure …
objective is concave and the constraints are convex in the state-action occupancy measure …
Provably efficient generalized lagrangian policy optimization for safe multi-agent reinforcement learning
We examine online safe multi-agent reinforcement learning using constrained Markov
games in which agents compete by maximizing their expected total rewards under a …
games in which agents compete by maximizing their expected total rewards under a …
Algorithm for constrained markov decision process with linear convergence
E Gladin, M Lavrik-Karmazin… - International …, 2023 - proceedings.mlr.press
The problem of constrained Markov decision process is considered. An agent aims to
maximize the expected accumulated discounted reward subject to multiple constraints on its …
maximize the expected accumulated discounted reward subject to multiple constraints on its …
Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach
Reinforcement learning (RL) is widely used in applications where one needs to perform
sequential decision-making while interacting with the environment. The standard RL …
sequential decision-making while interacting with the environment. The standard RL …
Policy gradient primal-dual mirror descent for constrained MDPs with large state spaces
We study constrained sequential decision-making problems modeled by constrained
Markov decision processes with potentially infinite state spaces. We propose a Bregman …
Markov decision processes with potentially infinite state spaces. We propose a Bregman …
Обзор выпуклой оптимизации марковских процессов принятия решений
ВД Руденко, НЕ Юдин, АА Васин - Компьютерные исследования и …, 2023 - mathnet.ru
В данной статье проведен обзор как исторических достижений, так и современных
результатов в области марковских процессов принятия решений (Markov Decision …
результатов в области марковских процессов принятия решений (Markov Decision …