Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Efficient rl with impaired observability: Learning to act with delayed and missing state observations

M Chen, Y Bai, HV Poor… - Advances in Neural …, 2024 - proceedings.neurips.cc
In real-world reinforcement learning (RL) systems, various forms of {\it impaired
observability} can complicate matters. These situations arise when an agent is unable to …

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

A Adibi, N Dal Fabbro, L Schenato… - International …, 2024 - proceedings.mlr.press
Motivated by applications in large-scale and multi-agent reinforcement learning, we study
the non-asymptotic performance of stochastic approximation (SA) schemes with delayed …

Near-optimal regret for adversarial mdp with delayed bandit feedback

T **, T Lancewicki, H Luo… - Advances in Neural …, 2022 - proceedings.neurips.cc
The standard assumption in reinforcement learning (RL) is that agents observe feedback for
their actions immediately. However, in practice feedback is often observed in delay. This …

Bayesian optimization under stochastic delayed feedback

A Verma, Z Dai, BKH Low - International Conference on …, 2022 - proceedings.mlr.press
Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization
of complex and expensive-to-compute black-box functions. The existing BO methods …

Banker online mirror descent: A universal approach for delayed online bandit learning

J Huang, Y Dai, L Huang - International Conference on …, 2023 - proceedings.mlr.press
Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …

Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints

H Guo, Z Qi, X Liu - Learning for Dynamics and Control …, 2023 - proceedings.mlr.press
This paper studies the problem of stochastic continuum-armed bandit with constraints
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …

Delay and cooperation in nonstochastic linear bandits

S Ito, D Hatano, H Sumita… - Advances in …, 2020 - proceedings.neurips.cc
This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …