- Academic Search

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

保存引用被引用次数：3299 相关文章所有 9 个版本图书馆搜索

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[图书][B] Prediction, learning, and games

N Cesa-Bianchi, G Lugosi - 2006 - books.google.com

This important text and reference for researchers and students in machine learning, game
theory, statistics and information theory offers a comprehensive treatment of the problem of …

保存引用被引用次数：5177 相关文章所有 14 个版本图书馆搜索

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

On upper-confidence bound policies for switching bandit problems

A Garivier, E Moulines - International conference on algorithmic learning …, 2011 - Springer

Many problems, such as cognitive radio, parameter control of a scanning tunnelling
microscope or internet advertisement, can be modelled as non-stationary bandit problems …

保存引用被引用次数：657 相关文章所有 10 个版本

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

The k-armed dueling bandits problem

Y Yue, J Broder, R Kleinberg, T Joachims - Journal of Computer and …, 2012 - Elsevier

We study a partial-information online-learning problem where actions are restricted to noisy
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …

保存引用被引用次数：418 相关文章所有 19 个版本

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

[PDF][PDF] From external to internal regret.

A Blum, Y Mansour - Journal of Machine Learning Research, 2007 - jmlr.org

External regret compares the performance of an online algorithm, selecting among N
actions, to the performance of the best of those actions in hindsight. Internal regret compares …

保存引用被引用次数：348 相关文章所有 9 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On upper-confidence bound policies for non-stationary bandit problems

A Garivier, E Moulines - arxiv preprint arxiv:0805.3415, 2008 - arxiv.org

Multi-armed bandit problems are considered as a paradigm of the trade-off between
exploring the environment to find profitable actions and exploiting what is already known. In …

保存引用被引用次数：302 相关文章所有 9 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

[PDF][PDF] Learning, regret minimization, and equilibria

A Blum, Y Monsour - 2007 - kilthub.cmu.edu

Many situations involve repeatedly making decisions in an uncertain envi-ronment: for
instance, deciding what route to drive to work each day, or repeated play of a game against …

保存引用被引用次数：271 相关文章所有 37 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters

M Khamassi, P Enel, PF Dominey, E Procyk - Progress in brain research, 2013 - Elsevier

Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in
feedback categorization, performance monitoring, and task monitoring, and may contribute …

保存引用被引用次数：64 相关文章所有 14 个版本

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Improved second-order bounds for prediction with expert advice

N Cesa-Bianchi, Y Mansour, G Stoltz - Machine Learning, 2007 - Springer

This work studies external regret in sequential prediction games with both positive and
negative payoffs. External regret measures the difference between the payoff obtained by …

保存引用被引用次数：257 相关文章所有 20 个版本

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

[PDF][PDF] No-regret learning in bilateral trade via global budget balance

M Bernasconi, M Castiglioni, A Celli… - Proceedings of the 56th …, 2024 - dl.acm.org

Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …

保存引用被引用次数：11 相关文章所有 3 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Regret minimization under partial monitoring

[图书][B] Bandit algorithms

[图书][B] Prediction, learning, and games

On upper-confidence bound policies for switching bandit problems

The k-armed dueling bandits problem

[PDF][PDF] From external to internal regret.

On upper-confidence bound policies for non-stationary bandit problems

[PDF][PDF] Learning, regret minimization, and equilibria

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters

Improved second-order bounds for prediction with expert advice

[PDF][PDF] No-regret learning in bilateral trade via global budget balance