Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press
We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

R Deb, A Saha, A Banerjee - International Conference on …, 2024 - proceedings.mlr.press
We consider the problem of reward maximization in the dueling bandit setup along with
constraints on resource consumption. As in the classic dueling bandits, at each round the …

Choice bandits

A Agarwal, N Johnson… - Advances in neural …, 2020 - proceedings.neurips.cc
There has been much interest in recent years in the problem of dueling bandits, where on
each round the learner plays a pair of arms and receives as feedback the outcome of a …

Nested elimination: a simple algorithm for best-item identification from choice-based feedback

J Yang, Y Feng - International Conference on Machine …, 2023 - proceedings.mlr.press
We study the problem of best-item identification from choice-based feedback. In this
problem, a company sequentially and adaptively shows display sets to a population of …

Exploiting correlation to achieve faster learning rates in low-rank preference bandits

S Ghoshal, A Saha - ar** the sample complexity as low as
possible is a common task in the field of multi-armed bandits. In the multi-dueling variant of …

Optimal and efficient dynamic regret algorithms for non-stationary dueling bandits

A Saha, S Gupta - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of dynamic regret minimization in $ K $-armed Dueling Bandits under
non-stationary or time-varying preferences. This is an online learning setup where the agent …

Exploiting correlation to achieve faster learning rates in low-rank preference bandits

A Saha, S Ghoshal - International Conference on Artificial …, 2022 - proceedings.mlr.press
Abstract We introduce the Correlated Preference Bandits problem with random utility-based
choice models (RUMs), where the goal is to identify the best item from a given pool of $ n …