The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
[LIVRE][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Optimal best-arm identification in linear bandits
We study the problem of best-arm identification with fixed confidence in stochastic linear
bandits. The objective is to identify the best arm with a given level of certainty while …
bandits. The objective is to identify the best arm with a given level of certainty while …
Mixture martingales revisited with applications to sequential tests and confidence intervals
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
High-dimensional sparse linear bandits
Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …
variety of domains, such as personalized medicine and online advertising. We derive a …
Fast pure exploration via frank-wolfe
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …
bandit environments. The goal of the learner is to answer a query about the environment …
Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously
In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …
environments. By plugging a novel loss estimator into the optimization problem that …
Approximate allocation matching for structural causal bandits with unobserved confounders
Structural causal bandit provides a framework for online decision-making problems when
causal information is available. It models the stochastic environment with a structural causal …
causal information is available. It models the stochastic environment with a structural causal …
Best arm identification with fixed budget: A large deviation perspective
We consider the problem of identifying the best arm in stochastic Multi-Armed Bandits
(MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error …
(MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error …