Google Академія

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

Зберегти Послатися Цитовано в 104 джерелах Пов’язані статті Кількість версій: 14 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu, Y Chen - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

Зберегти Послатися Цитовано в 40 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory

B Hu, U Syed - Advances in neural information processing …, 2019 - proceedings.neurips.cc

In this paper, we provide a unified analysis of temporal difference learning algorithms with
linear function approximators by exploiting their connections to Markov jump linear systems …

Зберегти Послатися Цитовано в 71 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] siam.org

Finite-time performance of distributed temporal-difference learning with linear function approximation

TT Doan, ST Maguluri, J Romberg - SIAM Journal on Mathematics of Data …, 2021 - SIAM

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a
Markov decision process. In this problem, the agents operate in a common environment …

Зберегти Послатися Цитовано в 55 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reanalysis of variance reduced temporal difference learning

T Xu, Z Wang, Y Zhou, Y Liang - ar** with function approximation

F Che, C **ao, J Mei, B Dai, R Gummadi… - arxiv preprint arxiv …, 2024 - arxiv.org

We prove that the combination of a target network and over-parameterized linear function
approximation establishes a weaker convergence condition for bootstrapped value …

Зберегти Послатися Цитовано в 3 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A unified switching system perspective and convergence analysis of Q-learning algorithms

D Lee, N He - Advances in neural information processing …, 2020 - proceedings.neurips.cc

This paper develops a novel and unified framework to analyze the convergence of a large
family of Q-learning algorithms from the switching system perspective. We show that the …

Зберегти Послатися Цитовано в 30 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A multistep Lyapunov approach for finite-time analysis of biased stochastic approximation

G Wang, B Li, GB Giannakis - arxiv preprint arxiv:1909.04299, 2019 - arxiv.org

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in
reinforcement learning, this paper studies a class of biased stochastic approximation (SA) …

Зберегти Послатися Цитовано в 36 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation

TT Doan - SIAM Journal on Control and Optimization, 2021 - SIAM

Motivated by its broad applications in machine learning and reinforcement learning, we
study the linear two-time-scale stochastic approximation, an iterative method using two …

Зберегти Послатися Цитовано в 34 джерелах Пов’язані статті Кількість версій: 4

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Target-based temporal-difference learning

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory

Finite-time performance of distributed temporal-difference learning with linear function approximation

Reanalysis of variance reduced temporal difference learning

A unified switching system perspective and convergence analysis of Q-learning algorithms

A multistep Lyapunov approach for finite-time analysis of biased stochastic approximation

Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation