Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu, Y Chen - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory

B Hu, U Syed - Advances in neural information processing …, 2019 - proceedings.neurips.cc
In this paper, we provide a unified analysis of temporal difference learning algorithms with
linear function approximators by exploiting their connections to Markov jump linear systems …

Finite-time performance of distributed temporal-difference learning with linear function approximation

TT Doan, ST Maguluri, J Romberg - SIAM Journal on Mathematics of Data …, 2021 - SIAM
We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a
Markov decision process. In this problem, the agents operate in a common environment …

Reanalysis of variance reduced temporal difference learning

T Xu, Z Wang, Y Zhou, Y Liang - ar** with function approximation
F Che, C **ao, J Mei, B Dai, R Gummadi… - arxiv preprint arxiv …, 2024 - arxiv.org
We prove that the combination of a target network and over-parameterized linear function
approximation establishes a weaker convergence condition for bootstrapped value …

A unified switching system perspective and convergence analysis of Q-learning algorithms

D Lee, N He - Advances in neural information processing …, 2020 - proceedings.neurips.cc
This paper develops a novel and unified framework to analyze the convergence of a large
family of Q-learning algorithms from the switching system perspective. We show that the …

A multistep Lyapunov approach for finite-time analysis of biased stochastic approximation

G Wang, B Li, GB Giannakis - arxiv preprint arxiv:1909.04299, 2019 - arxiv.org
Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in
reinforcement learning, this paper studies a class of biased stochastic approximation (SA) …

Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation

TT Doan - SIAM Journal on Control and Optimization, 2021 - SIAM
Motivated by its broad applications in machine learning and reinforcement learning, we
study the linear two-time-scale stochastic approximation, an iterative method using two …