[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

[图书][B] Prediction, learning, and games

N Cesa-Bianchi, G Lugosi - 2006 - books.google.com
This important text and reference for researchers and students in machine learning, game
theory, statistics and information theory offers a comprehensive treatment of the problem of …

On upper-confidence bound policies for switching bandit problems

A Garivier, E Moulines - International conference on algorithmic learning …, 2011 - Springer
Many problems, such as cognitive radio, parameter control of a scanning tunnelling
microscope or internet advertisement, can be modelled as non-stationary bandit problems …

The k-armed dueling bandits problem

Y Yue, J Broder, R Kleinberg, T Joachims - Journal of Computer and …, 2012 - Elsevier
We study a partial-information online-learning problem where actions are restricted to noisy
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …

[PDF][PDF] From external to internal regret.

A Blum, Y Mansour - Journal of Machine Learning Research, 2007 - jmlr.org
External regret compares the performance of an online algorithm, selecting among N
actions, to the performance of the best of those actions in hindsight. Internal regret compares …

On upper-confidence bound policies for non-stationary bandit problems

A Garivier, E Moulines - arxiv preprint arxiv:0805.3415, 2008 - arxiv.org
Multi-armed bandit problems are considered as a paradigm of the trade-off between
exploring the environment to find profitable actions and exploiting what is already known. In …

[PDF][PDF] Learning, regret minimization, and equilibria

A Blum, Y Monsour - 2007 - kilthub.cmu.edu
Many situations involve repeatedly making decisions in an uncertain envi-ronment: for
instance, deciding what route to drive to work each day, or repeated play of a game against …

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters

M Khamassi, P Enel, PF Dominey, E Procyk - Progress in brain research, 2013 - Elsevier
Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in
feedback categorization, performance monitoring, and task monitoring, and may contribute …

Improved second-order bounds for prediction with expert advice

N Cesa-Bianchi, Y Mansour, G Stoltz - Machine Learning, 2007 - Springer
This work studies external regret in sequential prediction games with both positive and
negative payoffs. External regret measures the difference between the payoff obtained by …

[PDF][PDF] No-regret learning in bilateral trade via global budget balance

M Bernasconi, M Castiglioni, A Celli… - Proceedings of the 56th …, 2024 - dl.acm.org
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …