[書籍][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Introduction to online convex optimization
E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
[書籍][B] Optimization for machine learning
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …
accessible to students and researchers in both communities. The interplay between …
Bypassing the simulator: Near-optimal adversarial linear contextual bandits
We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
Online learning with predictable sequences
We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …
[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.
Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …
information. In this setting, a player attempts to minimize a sequence of adversarially …
Combinatorial bandits
We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …
Modeling strong and human-like gameplay with KL-regularized search
We consider the task of accurately modeling strong human policies in multi-agent decision-
making problems, given examples of human behavior. Imitation learning is effective at …
making problems, given examples of human behavior. Imitation learning is effective at …