Академия Google

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Сохранить Цитировать Цитируется: 8 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Сохранить Цитировать Цитируется: 23 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]

[PDF] neurips.cc

Efficient rl with impaired observability: Learning to act with delayed and missing state observations

M Chen, Y Bai, HV Poor… - Advances in Neural …, 2024 - proceedings.neurips.cc

In real-world reinforcement learning (RL) systems, various forms of {\it impaired
observability} can complicate matters. These situations arise when an agent is unable to …

Сохранить Цитировать Цитируется: 8 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

Сохранить Цитировать Цитируется: 7 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

A Adibi, N Dal Fabbro, L Schenato… - International …, 2024 - proceedings.mlr.press

Motivated by applications in large-scale and multi-agent reinforcement learning, we study
the non-asymptotic performance of stochastic approximation (SA) schemes with delayed …

Сохранить Цитировать Цитируется: 7 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]

[PDF] neurips.cc

Near-optimal regret for adversarial mdp with delayed bandit feedback

T **, T Lancewicki, H Luo… - Advances in Neural …, 2022 - proceedings.neurips.cc

The standard assumption in reinforcement learning (RL) is that agents observe feedback for
their actions immediately. However, in practice feedback is often observed in delay. This …

Сохранить Цитировать Цитируется: 26 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Bayesian optimization under stochastic delayed feedback

A Verma, Z Dai, BKH Low - International Conference on …, 2022 - proceedings.mlr.press

Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization
of complex and expensive-to-compute black-box functions. The existing BO methods …

Сохранить Цитировать Цитируется: 15 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Banker online mirror descent: A universal approach for delayed online bandit learning

J Huang, Y Dai, L Huang - International Conference on …, 2023 - proceedings.mlr.press

Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …

Сохранить Цитировать Цитируется: 6 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints

H Guo, Z Qi, X Liu - Learning for Dynamics and Control …, 2023 - proceedings.mlr.press

This paper studies the problem of stochastic continuum-armed bandit with constraints
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …

Сохранить Цитировать Цитируется: 14 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]

[PDF] neurips.cc

Delay and cooperation in nonstochastic linear bandits

S Ito, D Hatano, H Sumita… - Advances in …, 2020 - proceedings.neurips.cc

This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …

Сохранить Цитировать Цитируется: 30 Похожие статьи Все версии статьи (7) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Linear bandits with stochastic delayed feedback

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

Efficient rl with impaired observability: Learning to act with delayed and missing state observations

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

Near-optimal regret for adversarial mdp with delayed bandit feedback

Bayesian optimization under stochastic delayed feedback

Banker online mirror descent: A universal approach for delayed online bandit learning

Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints

Delay and cooperation in nonstochastic linear bandits