- Academic Search

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

保存引用被引用数: 3285 関連記事全 9 バージョン図書館検索

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

[Free GPT-4]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

[Free GPT-4]

[PDF] mit.edu

[書籍][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

保存引用被引用数: 1042 関連記事全 33 バージョン図書館検索

[Free GPT-4]

[PDF] neurips.cc

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

H Liu, CY Wei, J Zimmert - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …

保存引用被引用数: 13 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Online learning with predictable sequences

A Rakhlin, K Sridharan - Conference on Learning Theory, 2013 - proceedings.mlr.press

We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …

保存引用被引用数: 395 関連記事全 16 バージョン HTMLバージョン

[Free GPT-4]

[PDF] psu.edu

[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

A Agarwal, O Dekel, L **ao - Colt, 2010 - Citeseer

Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …

保存引用被引用数: 444 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] sciencedirect.com

Combinatorial bandits

N Cesa-Bianchi, G Lugosi - Journal of Computer and System Sciences, 2012 - Elsevier

We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …

保存引用被引用数: 539 関連記事全 21 バージョン

[Free GPT-4]

[PDF] mlr.press

Modeling strong and human-like gameplay with KL-regularized search

AP Jacob, DJ Wu, G Farina, A Lerer… - International …, 2022 - proceedings.mlr.press

We consider the task of accurately modeling strong human policies in multi-agent decision-
making problems, given examples of human behavior. Imitation learning is effective at …

保存引用被引用数: 59 関連記事全 6 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Beating the adaptive bandit with high probability

[書籍][B] Bandit algorithms

Introduction to multi-armed bandits

Introduction to online convex optimization

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

[書籍][B] Optimization for machine learning

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

Online learning with predictable sequences

[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

Combinatorial bandits

Modeling strong and human-like gameplay with KL-regularized search