Google Наука

G Neu - Advances in Neural Information Processing …, 2015 - proceedings.neurips.cc

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit
problems, focusing on performance guarantees that hold with high probability. Such results …

Запазване Позоваване С позовавания в 210 Сродни статии Всички 15 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Online learning with feedback graphs: Beyond bandits

N Alon, N Cesa-Bianchi, O Dekel… - … on Learning Theory, 2015 - proceedings.mlr.press

We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …

Запазване Позоваване С позовавания в 184 Сродни статии Всички 14 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Optimal algorithms for stochastic contextual preference bandits

A Saha - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

We consider the problem of preference bandits in the contextual setting. At each round, the
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …

Запазване Позоваване С позовавания в 43 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Delay and cooperation in nonstochastic bandits

N Cesa-Bianchi, C Gentile… - … on Learning Theory, 2016 - proceedings.mlr.press

We study networks of communicating learning agents that cooperate to solve a common
nonstochastic bandit problem. Agents use an underlying communication network to get …

Запазване Позоваване С позовавания в 119 Сродни статии Всички 17 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

S Ito, T Tsuchiya, J Honda - Advances in Neural Information …, 2022 - proceedings.neurips.cc

This study considers online learning with general directed feedback graphs. For this
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …

Запазване Позоваване С позовавания в 29 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Repeated bilateral trade against a smoothed adversary

N Cesa-Bianchi, TR Cesari… - The Thirty Sixth …, 2023 - proceedings.mlr.press

We study repeated bilateral trade where an adaptive $\sigma $-smooth adversary generates
the valuations of sellers and buyers. We provide a complete characterization of the regret …

Запазване Позоваване С позовавания в 20 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Запазване Позоваване С позовавания в 23 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Knowledge-aware conversational preference elicitation with bandit feedback

C Zhao, T Yu, Z **e, S Li - Proceedings of the ACM Web Conference …, 2022 - dl.acm.org

Conversational recommender systems (CRSs) have been proposed recently to mitigate the
cold-start problem suffered by the traditional recommender systems. By introducing …

Запазване Позоваване С позовавания в 27 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press

Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …

Запазване Позоваване С позовавания в 55 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

[PDF][PDF] No-regret learning in bilateral trade via global budget balance

M Bernasconi, M Castiglioni, A Celli… - Proceedings of the 56th …, 2024 - dl.acm.org

Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …

Запазване Позоваване С позовавания в 13 Сродни статии Всички 6 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Nonstochastic multi-armed bandits with graph-structured feedback

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Online learning with feedback graphs: Beyond bandits

Optimal algorithms for stochastic contextual preference bandits

Delay and cooperation in nonstochastic bandits

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

Repeated bilateral trade against a smoothed adversary

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

Knowledge-aware conversational preference elicitation with bandit feedback

Information directed sampling for linear partial monitoring

[PDF][PDF] No-regret learning in bilateral trade via global budget balance