Explore no more: Improved high-probability regret bounds for non-stochastic bandits

G Neu - Advances in Neural Information Processing …, 2015 - proceedings.neurips.cc
This work addresses the problem of regret minimization in non-stochastic multi-armed bandit
problems, focusing on performance guarantees that hold with high probability. Such results …

Online learning with feedback graphs: Beyond bandits

N Alon, N Cesa-Bianchi, O Dekel… - … on Learning Theory, 2015 - proceedings.mlr.press
We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …

Optimal algorithms for stochastic contextual preference bandits

A Saha - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
We consider the problem of preference bandits in the contextual setting. At each round, the
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …

Delay and cooperation in nonstochastic bandits

N Cesa-Bianchi, C Gentile… - … on Learning Theory, 2016 - proceedings.mlr.press
We study networks of communicating learning agents that cooperate to solve a common
nonstochastic bandit problem. Agents use an underlying communication network to get …

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

S Ito, T Tsuchiya, J Honda - Advances in Neural Information …, 2022 - proceedings.neurips.cc
This study considers online learning with general directed feedback graphs. For this
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …

Repeated bilateral trade against a smoothed adversary

N Cesa-Bianchi, TR Cesari… - The Thirty Sixth …, 2023 - proceedings.mlr.press
We study repeated bilateral trade where an adaptive $\sigma $-smooth adversary generates
the valuations of sellers and buyers. We provide a complete characterization of the regret …

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Knowledge-aware conversational preference elicitation with bandit feedback

C Zhao, T Yu, Z **e, S Li - Proceedings of the ACM Web Conference …, 2022 - dl.acm.org
Conversational recommender systems (CRSs) have been proposed recently to mitigate the
cold-start problem suffered by the traditional recommender systems. By introducing …

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press
Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …

[PDF][PDF] No-regret learning in bilateral trade via global budget balance

M Bernasconi, M Castiglioni, A Celli… - Proceedings of the 56th …, 2024 - dl.acm.org
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …