Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
T **: Understanding the benefits of reward engineering on sample complexity
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …
making problems has been much discussed, but often ignored in this discussion is the …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Provably efficient safe exploration via primal-dual policy optimization
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …
processes in which an agent aims to maximize the expected total reward subject to a safety …
Almost optimal model-free reinforcement learningvia reference-advantage decomposition
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Deployment-efficient reinforcement learning via model-based offline optimization
Most reinforcement learning (RL) algorithms assume online access to the environment, in
which one may readily interleave updates to the policy with experience collection using that …
which one may readily interleave updates to the policy with experience collection using that …
Towards instance-optimal offline reinforcement learning with pessimism
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
Understanding domain randomization for sim-to-real transfer
Reinforcement learning encounters many challenges when applied directly in the real world.
Sim-to-real transfer is widely used to transfer the knowledge learned from simulation to the …
Sim-to-real transfer is widely used to transfer the knowledge learned from simulation to the …