[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Contextual decision processes with low bellman rank are pac-learnable
This paper studies systematic exploration for reinforcement learning (RL) with rich
observations and function approximation. We introduce contextual decision processes …
observations and function approximation. We introduce contextual decision processes …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Taming the monster: A fast and simple algorithm for contextual bandits
We present a new algorithm for the contextual bandit learning problem, where the learner
repeatedly takes one of K\emphactions in response to the observed\emphcontext, and …
repeatedly takes one of K\emphactions in response to the observed\emphcontext, and …
Bandits with knapsacks
Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
Adaptive treatment assignment in experiments for policy choice
Standard experimental designs are geared toward point estimation and hypothesis testing,
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
Bypassing the simulator: Near-optimal adversarial linear contextual bandits
We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …