- Academic Search

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

Salva Cita Citato da 121 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press

We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

Salva Cita Citato da 41 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

R Deb, A Saha, A Banerjee - International Conference on …, 2024 - proceedings.mlr.press

We consider the problem of reward maximization in the dueling bandit setup along with
constraints on resource consumption. As in the classic dueling bandits, at each round the …

Salva Cita Citato da 2 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Choice bandits

A Agarwal, N Johnson… - Advances in neural …, 2020 - proceedings.neurips.cc

There has been much interest in recent years in the problem of dueling bandits, where on
each round the learner plays a pair of arms and receives as feedback the outcome of a …

Salva Cita Citato da 23 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Nested elimination: a simple algorithm for best-item identification from choice-based feedback

J Yang, Y Feng - International Conference on Machine …, 2023 - proceedings.mlr.press

We study the problem of best-item identification from choice-based feedback. In this
problem, a company sequentially and adaptively shows display sets to a population of …

Salva Cita Citato da 5 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Exploiting correlation to achieve faster learning rates in low-rank preference bandits

S Ghoshal, A Saha - ar** the sample complexity as low as
possible is a common task in the field of multi-armed bandits. In the multi-dueling variant of …

Salva Cita Citato da 10 Articoli correlati Tutte e 8 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Optimal and efficient dynamic regret algorithms for non-stationary dueling bandits

A Saha, S Gupta - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of dynamic regret minimization in $ K $-armed Dueling Bandits under
non-stationary or time-varying preferences. This is an online learning setup where the agent …

Salva Cita Citato da 8 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Exploiting correlation to achieve faster learning rates in low-rank preference bandits

A Saha, S Ghoshal - International Conference on Artificial …, 2022 - proceedings.mlr.press

Abstract We introduce the Correlated Preference Bandits problem with random utility-based
choice models (RUMs), where the goal is to identify the best item from a given pool of $ n …

Salva Cita Citato da 4 Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Best-item learning in random utility models with subset choices

Preference-based online learning with dueling bandits: A survey

Efficient and optimal algorithms for contextual dueling bandits under realizability

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

Choice bandits

Nested elimination: a simple algorithm for best-item identification from choice-based feedback

Exploiting correlation to achieve faster learning rates in low-rank preference bandits

Optimal and efficient dynamic regret algorithms for non-stationary dueling bandits

Exploiting correlation to achieve faster learning rates in low-rank preference bandits