Google Académico

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press

Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

Guardar Citar Citado por 19 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press

We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Guardar Citar Citado por 8 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Information directed sampling for stochastic bandits with graph feedback

F Liu, S Buccapatnam, N Shroff - … of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org

We consider stochastic multi-armed bandit problems with graph feedback, where the
decision maker is allowed to observe the neighboring actions of the chosen action. We allow …

Guardar Citar Citado por 50 Artículos relacionados Las 11 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bandits with feedback graphs and switching costs

R Arora, TV Marinov, M Mohri - Advances in Neural …, 2019 - proceedings.neurips.cc

We study the adversarial multi-armed bandit problem where the learner is supplied with
partial observations modeled by a\emph {feedback graph} and where shifting to a new …

Guardar Citar Citado por 34 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Small-loss bounds for online learning with partial information

T Lykouris, K Sridharan… - Conference on Learning …, 2018 - proceedings.mlr.press

We consider the problem of adversarial (non-stochastic) online learning with partial
information feedback, where at each round, a decision maker selects an action from a finite …

Guardar Citar Citado por 45 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Satisficing in time-sensitive bandit learning

D Russo, B Van Roy - arxiv preprint arxiv:1803.02855, 2018 - arxiv.org

Much of the recent literature on bandit learning focuses on algorithms that aim to converge
on an optimal action. One shortcoming is that this orientation does not account for time …

Guardar Citar Citado por 39 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Feedback graph regret bounds for Thompson sampling and UCB

T Lykouris, E Tardos, D Wali - Algorithmic Learning Theory, 2020 - proceedings.mlr.press

We study the stochastic multi-armed bandit problem with the graph-based feedback
structure introduced by Mannor and Shamir. We analyze the performance of the two most …

Guardar Citar Citado por 30 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Simultaneously learning stochastic and adversarial bandits with general graph feedback

F Kong, Y Zhou, S Li - International Conference on Machine …, 2022 - proceedings.mlr.press

The problem of online learning with graph feedback has been extensively studied in the
literature due to its generality and potential to model various learning tasks. Existing works …

Guardar Citar Citado por 10 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding bandits with graph feedback

H Chen, S Li, C Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011],
is modeled by a directed graph $ G=(V, E) $ where $ V $ is the collection of bandit arms, and …

Guardar Citar Citado por 15 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

First-order bayesian regret analysis of thompson sampling

S Bubeck, M Sellke - Algorithmic Learning Theory, 2020 - proceedings.mlr.press

We address online combinatorial optimization when the player has a prior over the
adversary's sequence of losses. In this setting, Russo and Van Roy proposed an information …

Guardar Citar Citado por 27 Artículos relacionados Las 4 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Thompson sampling for stochastic bandits with graph feedback

Contextual information-directed sampling

Leveraging demonstrations to improve online learning: Quality matters

Information directed sampling for stochastic bandits with graph feedback

Bandits with feedback graphs and switching costs

Small-loss bounds for online learning with partial information

Satisficing in time-sensitive bandit learning

Feedback graph regret bounds for Thompson sampling and UCB

Simultaneously learning stochastic and adversarial bandits with general graph feedback

Understanding bandits with graph feedback

First-order bayesian regret analysis of thompson sampling