Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Regret minimization with performative feedback

M Jagadeesan, T Zrnic… - … on Machine Learning, 2022 - proceedings.mlr.press
In performative prediction, the deployment of a predictive model triggers a shift in the data
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …

Robust lipschitz bandits to adversarial corruptions

Y Kang, CJ Hsieh, TCM Lee - Advances in Neural …, 2023 - proceedings.neurips.cc
Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set
defined on a metric space, where the reward function is subject to a Lipschitz constraint. In …

Multiobjective lipschitz bandits under lexicographic ordering

B Xue, J Cheng, F Liu, Y Wang, Q Zhang - Proceedings of the AAAI …, 2024 - ojs.aaai.org
This paper studies the multiobjective bandit problem under lexicographic ordering, wherein
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …

Zeroth-order non-convex learning via hierarchical dual averaging

A Héliou, M Martin, P Mertikopoulos… - … on Machine Learning, 2021 - proceedings.mlr.press
We propose a hierarchical version of dual averaging for zeroth-order online non-convex
optimization {–} ie, learning processes where, at each stage, the optimizer is facing an …

Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols

X Dai, Z Wang, J Ye, JCS Lui - 2024 IEEE/ACM 32nd …, 2024 - ieeexplore.ieee.org
Optimizing network protocols is crucial for improving application performance. Recent
research works use multi-armed bandit (MAB) online learning methods to address network …

Intelligent informative frequency band searching assisted by a dynamic bandit tree method for machine fault diagnosis

Z Mo, Z Zhang, Q Miao, KL Tsui - IEEE/ASME Transactions on …, 2022 - ieeexplore.ieee.org
The fault informative frequency band searching is crucial to envelope analysis-based
machine fault diagnosis. Its success often depends on effective filters. However, existing …

Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration

SP Parambath, C Anagnostopoulos… - Data Mining and …, 2025 - Springer
In this paper, we investigate Thompson sampling-based sequential block elimination
approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit …

New Perspectives in Online Contract Design

S Zuo - arxiv preprint arxiv:2403.07143, 2024 - arxiv.org
This work studies the repeated principal-agent problem from an online learning perspective.
The principal's goal is to learn the optimal contract that maximizes her utility through …

[PDF][PDF] Online Defense Strategies for Reinforcement Learning Against Adaptive Reward Poisoning

A Nika, A Singla, G Radanovic - 26th International Conference on …, 2023 - pure.mpg.de
We consider the problem of defense against reward-poisoning attacks in reinforcement
learning and formulate it as a game in T rounds between a defender and an adaptive …