A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes

H Zhong, T Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y **, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Nearly minimax optimal reinforcement learning with linear function approximation

P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press
We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature map** $\boldsymbol …

Distributionally robust off-dynamics reinforcement learning: Provable efficiency with linear function approximation

Z Liu, P Xu - International Conference on Artificial …, 2024 - proceedings.mlr.press
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source
domain and deployed to a distinct target domain. We aim to solve this problem via online …

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Dynamic regret of adversarial linear mixture MDPs

LF Li, P Zhao, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …

Multiple greedy quasi-newton methods for saddle point problems

M **ao, S Bo, Z Wu - … on Data-driven Optimization of Complex …, 2024 - ieeexplore.ieee.org
This paper introduces the Multiple Greedy Quasi-Newton (MGSR1-SP) method, a novel
approach to solving strongly-convex-strongly-concave (SCSC) saddle point problems. Our …

Optimistic natural policy gradient: a simple efficient policy optimization framework for online rl

Q Liu, G Weisz, A György, C **… - Advances in Neural …, 2024 - proceedings.neurips.cc
While policy optimization algorithms have played an important role in recent empirical
success of Reinforcement Learning (RL), the existing theoretical understanding of policy …

Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arxiv preprint arxiv …, 2023 - arxiv.org
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …