Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
G Neu - Advances in Neural Information Processing …, 2015 - proceedings.neurips.cc
This work addresses the problem of regret minimization in non-stochastic multi-armed bandit
problems, focusing on performance guarantees that hold with high probability. Such results …
problems, focusing on performance guarantees that hold with high probability. Such results …
Online learning with feedback graphs: Beyond bandits
We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …
graph. This class includes online prediction with expert advice and the multi-armed bandit …
Optimal algorithms for stochastic contextual preference bandits
A Saha - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
We consider the problem of preference bandits in the contextual setting. At each round, the
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …
Delay and cooperation in nonstochastic bandits
We study networks of communicating learning agents that cooperate to solve a common
nonstochastic bandit problem. Agents use an underlying communication network to get …
nonstochastic bandit problem. Agents use an underlying communication network to get …
Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs
This study considers online learning with general directed feedback graphs. For this
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …
Repeated bilateral trade against a smoothed adversary
We study repeated bilateral trade where an adaptive $\sigma $-smooth adversary generates
the valuations of sellers and buyers. We provide a complete characterization of the regret …
the valuations of sellers and buyers. We provide a complete characterization of the regret …
Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …
environments, where the goal of the learner is to aggregate information through relative …
Knowledge-aware conversational preference elicitation with bandit feedback
Conversational recommender systems (CRSs) have been proposed recently to mitigate the
cold-start problem suffered by the traditional recommender systems. By introducing …
cold-start problem suffered by the traditional recommender systems. By introducing …
Information directed sampling for linear partial monitoring
Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …
generalizes many well known bandit models, including linear, combinatorial and dueling …
[PDF][PDF] No-regret learning in bilateral trade via global budget balance
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …
and a buyer—both characterized by a private valuation for an item they want to trade. We …