Google 學術搜尋

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

儲存引用被引用 3287 次相關文章全部共 9 個版本圖書館搜尋

[Free GPT-4]

[PDF] arxiv.org

A modern introduction to online learning

F Orabona - arxiv preprint arxiv:1912.13213, 2019 - arxiv.org

In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

儲存引用被引用 418 次相關文章全部共 3 個版本 HTML 版

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

儲存引用被引用 1253 次相關文章全部共 7 個版本圖書館搜尋 HTML 版

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

[Free GPT-4]

[PDF] arxiv.org

A survey of learning in multiagent environments: Dealing with non-stationarity

P Hernandez-Leal, M Kaisers, T Baarslag… - arxiv preprint arxiv …, 2017 - arxiv.org

The key challenge in multiagent learning is learning a best response to the behaviour of
other agents, which may be non-stationary: if the other agents adapt their strategy as well …

儲存引用被引用 367 次相關文章全部共 5 個版本 HTML 版

[Free GPT-4]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

[Free GPT-4]

[PDF] mlr.press

Online control with adversarial disturbances

N Agarwal, B Bullins, E Hazan… - International …, 2019 - proceedings.mlr.press

We study the control of linear dynamical systems with adversarial disturbances, as opposed
to statistical noise. We present an efficient algorithm that achieves nearly-tight regret bounds …

儲存引用被引用 271 次相關文章全部共 17 個版本 HTML 版

[Free GPT-4]

[PDF] nsf.gov

Online learning algorithms

N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org

Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …

儲存引用被引用 46 次相關文章全部共 6 個版本

[Free GPT-4]

[PDF] neurips.cc

Logarithmic regret for online control

N Agarwal, E Hazan, K Singh - Advances in Neural …, 2019 - proceedings.neurips.cc

We study optimal regret bounds for control in linear dynamical systems under adversarially
changing strongly convex cost functions, given the knowledge of transition dynamics. This …

儲存引用被引用 126 次相關文章全部共 12 個版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Rotting bandits

N Levine, K Crammer… - Advances in neural …, 2017 - proceedings.neurips.cc

Abstract The Multi-Armed Bandits (MAB) framework highlights the trade-off between
acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation) …

儲存引用被引用 141 次相關文章全部共 6 個版本 HTML 版

建立快訊

引用

進階搜尋

已儲存至「我的圖書館」

Online bandit learning against an adaptive adversary: from regret to policy regret

[書籍][B] Bandit algorithms

A modern introduction to online learning

Introduction to multi-armed bandits

Introduction to online convex optimization

A survey of learning in multiagent environments: Dealing with non-stationarity

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Online control with adversarial disturbances

Online learning algorithms

Logarithmic regret for online control

Rotting bandits