A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

A finite-time analysis of two time-scale actor-critic methods

YF Wu, W Zhang, P Xu, Q Gu - Advances in Neural …, 2020 - proceedings.neurips.cc
Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …

Breaking the sample size barrier in model-based reinforcement learning with a generative model

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

A single-timescale method for stochastic bilevel optimization

T Chen, Y Sun, Q **ao, W Yin - International Conference on …, 2022 - proceedings.mlr.press
Stochastic bilevel optimization generalizes the classic stochastic optimization from the
minimization of a single objective to the minimization of an objective function that depends …

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G **ong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

A single-timescale method for stochastic bilevel optimization

T Chen, Y Sun, Q **ao, W Yin - arxiv preprint arxiv:2102.04671, 2021 - arxiv.org
Stochastic bilevel optimization generalizes the classic stochastic optimization from the
minimization of a single objective to the minimization of an objective function that depends …

A lyapunov theory for finite-sample guarantees of markovian stochastic approximation

Z Chen, ST Maguluri, S Shakkottai… - Operations …, 2024 - pubsonline.informs.org
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian
stochastic approximation (SA) algorithm under a contraction operator with respect to an …

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - arxiv preprint arxiv:2005.03557, 2020 - arxiv.org
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …