A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …
approximation and sparse rewards. We design a new algorithm, Variance-weighted …
Nearly minimax optimal reinforcement learning with linear function approximation
We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature map** $\boldsymbol …
probability and reward functions are linear with respect to a feature map** $\boldsymbol …
Distributionally robust off-dynamics reinforcement learning: Provable efficiency with linear function approximation
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source
domain and deployed to a distinct target domain. We aim to solve this problem via online …
domain and deployed to a distinct target domain. We aim to solve this problem via online …
Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …
zhou2022computationally} have provided variance-dependent regret bounds for linear …
Posterior sampling with delayed feedback for reinforcement learning with linear function approximation
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …
function approximation to alleviate the sample complexity hurdle for better performance …
Dynamic regret of adversarial linear mixture MDPs
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …
information rewards and the unknown transition kernel. We consider the linear mixture …
Multiple greedy quasi-newton methods for saddle point problems
M **ao, S Bo, Z Wu - … on Data-driven Optimization of Complex …, 2024 - ieeexplore.ieee.org
This paper introduces the Multiple Greedy Quasi-Newton (MGSR1-SP) method, a novel
approach to solving strongly-convex-strongly-concave (SCSC) saddle point problems. Our …
approach to solving strongly-convex-strongly-concave (SCSC) saddle point problems. Our …
Optimistic natural policy gradient: a simple efficient policy optimization framework for online rl
While policy optimization algorithms have played an important role in recent empirical
success of Reinforcement Learning (RL), the existing theoretical understanding of policy …
success of Reinforcement Learning (RL), the existing theoretical understanding of policy …
Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …