Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Recent developments in machine learning methods for stochastic control and games
Stochastic optimal control and games have a wide range of applications, from finance and
economics to social sciences, robotics, and energy management. Many real-world …
economics to social sciences, robotics, and energy management. Many real-world …
On the convergence rates of policy gradient methods
L **ao - Journal of Machine Learning Research, 2022 - jmlr.org
We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …
action spaces and study the convergence rates of the projected policy gradient method and …
Natural policy gradient primal-dual method for constrained markov decision processes
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …
expected total reward while satisfying a constraint on the expected total utility. We employ …
Policy gradient method for robust reinforcement learning
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …
complexity analysis for robust reinforcement learning under model mismatch. Robust …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
Dpo meets ppo: Reinforced token optimization for rlhf
In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal
Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards--a …
Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards--a …
Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes
G Lan - Mathematical programming, 2023 - Springer
We present new policy mirror descent (PMD) methods for solving reinforcement learning
(RL) problems with either strongly convex or general convex regularizers. By exploring the …
(RL) problems with either strongly convex or general convex regularizers. By exploring the …
A finite-time analysis of two time-scale actor-critic methods
Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …
Crpo: A new approach for safe reinforcement learning with convergence guarantee
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …
maximize an expected total reward and meanwhile avoids violation of certain constraints on …