- Academic Search

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Zapisz Cytuj Cytowane przez 1264 Powiązane artykuły Wszystkie wersje 7 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

Zapisz Cytuj Cytowane przez 2213 Powiązane artykuły Wszystkie wersje 19 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Zapisz Cytuj Cytowane przez 3294 Powiązane artykuły Wszystkie wersje 26 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Online learning and online convex optimization

S Shalev-Shwartz - Foundations and Trends® in Machine …, 2012 - nowpublishers.com

Online learning is a well established learning paradigm which has both theoretical and
practical appeals. The goal of online learning is to make a sequence of accurate predictions …

Zapisz Cytuj Cytowane przez 2655 Powiązane artykuły Wszystkie wersje 20 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gaussian process optimization in the bandit setting: No regret and experimental design

N Srinivas, A Krause, SM Kakade, M Seeger - arxiv preprint arxiv …, 2009 - arxiv.org

Many applications require optimizing an unknown, noisy function that is expensive to
evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function …

Zapisz Cytuj Cytowane przez 2908 Powiązane artykuły Wszystkie wersje 32 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

An information-theoretic analysis of thompson sampling

D Russo, B Van Roy - Journal of Machine Learning Research, 2016 - jmlr.org

We provide an information-theoretic analysis of Thompson sampling that applies across a
broad range of online optimization problems in which a decision-maker must learn from …

Zapisz Cytuj Cytowane przez 472 Powiązane artykuły Wszystkie wersje 11 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

[KSIĄŻKA][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

Zapisz Cytuj Cytowane przez 1043 Powiązane artykuły Wszystkie wersje 31 Wyszukiwanie bibliotek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Contextual gaussian process bandit optimization

A Krause, C Ong - Advances in neural information …, 2011 - proceedings.neurips.cc

How should we design experiments to maximize performance of a complex system, taking
into account uncontrollable environmental conditions? How should we select relevant …

Zapisz Cytuj Cytowane przez 514 Powiązane artykuły Wszystkie wersje 19 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] upenn.edu

Stochastic linear optimization under bandit feedback

V Dani, TP Hayes, SM Kakade - 21st Annual Conference on …, 2008 - repository.upenn.edu

In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a
decision maker chooses one of k arms and incurs a cost chosen from an unknown …

Zapisz Cytuj Cytowane przez 977 Powiązane artykuły Wszystkie wersje 12 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Off-policy evaluation for slate recommendation

A Swaminathan, A Krishnamurthy… - Advances in …, 2017 - proceedings.neurips.cc

This paper studies the evaluation of policies that recommend an ordered set of items (eg, a
ranking) based on some context---a common scenario in web search, ads, and …

Zapisz Cytuj Cytowane przez 244 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

The price of bandit information for online optimization

Introduction to multi-armed bandits

Introduction to online convex optimization

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Online learning and online convex optimization

Gaussian process optimization in the bandit setting: No regret and experimental design

An information-theoretic analysis of thompson sampling

[KSIĄŻKA][B] Optimization for machine learning

Contextual gaussian process bandit optimization

Stochastic linear optimization under bandit feedback

Off-policy evaluation for slate recommendation