Posterior sampling with delayed feedback for reinforcement learning with linear function approximation
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …
function approximation to alleviate the sample complexity hurdle for better performance …
Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …
environments, where the goal of the learner is to aggregate information through relative …
Efficient rl with impaired observability: Learning to act with delayed and missing state observations
In real-world reinforcement learning (RL) systems, various forms of {\it impaired
observability} can complicate matters. These situations arise when an agent is unable to …
observability} can complicate matters. These situations arise when an agent is unable to …
Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …
and appealing empirical performance. However, many existing analytical and empirical …
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling
Motivated by applications in large-scale and multi-agent reinforcement learning, we study
the non-asymptotic performance of stochastic approximation (SA) schemes with delayed …
the non-asymptotic performance of stochastic approximation (SA) schemes with delayed …
Near-optimal regret for adversarial mdp with delayed bandit feedback
The standard assumption in reinforcement learning (RL) is that agents observe feedback for
their actions immediately. However, in practice feedback is often observed in delay. This …
their actions immediately. However, in practice feedback is often observed in delay. This …
Bayesian optimization under stochastic delayed feedback
Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization
of complex and expensive-to-compute black-box functions. The existing BO methods …
of complex and expensive-to-compute black-box functions. The existing BO methods …
Banker online mirror descent: A universal approach for delayed online bandit learning
Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …
Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints
This paper studies the problem of stochastic continuum-armed bandit with constraints
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …
Delay and cooperation in nonstochastic linear bandits
This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …