[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
[图书][B] Prediction, learning, and games
N Cesa-Bianchi, G Lugosi - 2006 - books.google.com
This important text and reference for researchers and students in machine learning, game
theory, statistics and information theory offers a comprehensive treatment of the problem of …
theory, statistics and information theory offers a comprehensive treatment of the problem of …
On upper-confidence bound policies for switching bandit problems
Many problems, such as cognitive radio, parameter control of a scanning tunnelling
microscope or internet advertisement, can be modelled as non-stationary bandit problems …
microscope or internet advertisement, can be modelled as non-stationary bandit problems …
The k-armed dueling bandits problem
We study a partial-information online-learning problem where actions are restricted to noisy
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …
[PDF][PDF] From external to internal regret.
External regret compares the performance of an online algorithm, selecting among N
actions, to the performance of the best of those actions in hindsight. Internal regret compares …
actions, to the performance of the best of those actions in hindsight. Internal regret compares …
On upper-confidence bound policies for non-stationary bandit problems
Multi-armed bandit problems are considered as a paradigm of the trade-off between
exploring the environment to find profitable actions and exploiting what is already known. In …
exploring the environment to find profitable actions and exploiting what is already known. In …
[PDF][PDF] Learning, regret minimization, and equilibria
Many situations involve repeatedly making decisions in an uncertain envi-ronment: for
instance, deciding what route to drive to work each day, or repeated play of a game against …
instance, deciding what route to drive to work each day, or repeated play of a game against …
Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters
Converging evidence suggest that the medial prefrontal cortex (MPFC) is involved in
feedback categorization, performance monitoring, and task monitoring, and may contribute …
feedback categorization, performance monitoring, and task monitoring, and may contribute …
Improved second-order bounds for prediction with expert advice
This work studies external regret in sequential prediction games with both positive and
negative payoffs. External regret measures the difference between the payoff obtained by …
negative payoffs. External regret measures the difference between the payoff obtained by …
[PDF][PDF] No-regret learning in bilateral trade via global budget balance
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …
and a buyer—both characterized by a private valuation for an item they want to trade. We …