Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
A finite-time analysis of two time-scale actor-critic methods
Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …
Breaking the sample size barrier in model-based reinforcement learning with a generative model
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …
infinite-horizon Markov decision process (MDP) with state space S and action space A …
Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …
A single-timescale method for stochastic bilevel optimization
Stochastic bilevel optimization generalizes the classic stochastic optimization from the
minimization of a single objective to the minimization of an objective function that depends …
minimization of a single objective to the minimization of an objective function that depends …
Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
A single-timescale method for stochastic bilevel optimization
Stochastic bilevel optimization generalizes the classic stochastic optimization from the
minimization of a single objective to the minimization of an objective function that depends …
minimization of a single objective to the minimization of an objective function that depends …
A lyapunov theory for finite-sample guarantees of markovian stochastic approximation
Z Chen, ST Maguluri, S Shakkottai… - Operations …, 2024 - pubsonline.informs.org
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian
stochastic approximation (SA) algorithm under a contraction operator with respect to an …
stochastic approximation (SA) algorithm under a contraction operator with respect to an …
Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …