- Academic Search

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc

We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Opslaan Citeren Geciteerd door 115 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning personalized decision support policies

U Bhatt, V Chen, KM Collins, P Kamalaruban… - arxiv preprint arxiv …, 2023 - arxiv.org

Individual human decision-makers may benefit from different forms of support to improve
decision outcomes, but when each form of support will yield better outcomes? In this work …

Opslaan Citeren Geciteerd door 14 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tracking most significant shifts in nonparametric contextual bandits

J Suk, S Kpotufe - Advances in Neural Information …, 2023 - proceedings.neurips.cc

We study nonparametric contextual bandits where Lipschitz mean reward functions may
change over time. We first establish the minimax dynamic regret rate in this less understood …

Opslaan Citeren Geciteerd door 4 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynamic contextual pricing with doubly non-parametric random utility models

E Chen, X Chen, L Gao, J Li - arxiv preprint arxiv:2405.06866, 2024 - arxiv.org

In the evolving landscape of digital commerce, adaptive dynamic pricing strategies are
essential for gaining a competitive edge. This paper introduces novel {\em doubly …

Opslaan Citeren Geciteerd door 3 Verwante artikelen Alle 3 versies HTML-versie

Unifying offline causal inference and online bandit learning for data driven decision

Y Li, H **e, Y Lin, JCS Lui - Proceedings of the Web Conference 2021, 2021 - dl.acm.org

A fundamental question for companies with large amount of logged data is: How to use such
logged data together with incoming streaming data to make good decisions? Many …

Opslaan Citeren Geciteerd door 20 Verwante artikelen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The role of contextual information in best arm identification

M Kato, K Ariu - arxiv preprint arxiv:2106.14077, 2021 - arxiv.org

We study the best-arm identification problem with fixed confidence when contextual
(covariate) information is available in stochastic bandits. Although we can use contextual …

Opslaan Citeren Geciteerd door 16 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adversarial rewards in universal learning for contextual bandits

M Blanchard, S Hanneke, P Jaillet - arxiv preprint arxiv:2302.07186, 2023 - arxiv.org

We study the fundamental limits of learning in contextual bandits, where a learner's rewards
depend on their actions and a known context, which extends the canonical multi-armed …

Opslaan Citeren Geciteerd door 5 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Adaptive algorithm for multi-armed bandit problem with high-dimensional covariates

W Qian, CK Ing, J Liu - Journal of the American Statistical …, 2024 - Taylor & Francis

This article studies an important sequential decision making problem known as the multi-
armed stochastic bandit problem with covariates. Under a linear bandit framework with high …

Opslaan Citeren Geciteerd door 5 Verwante artikelen Alle 5 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Thompson sampling in partially observable contextual bandits

H Park, MKS Faradonbeh - arxiv preprint arxiv:2402.10289, 2024 - arxiv.org

Contextual bandits constitute a classical framework for decision-making under uncertainty.
In this setting, the goal is to learn the arms of highest reward subject to contextual …

Opslaan Citeren Geciteerd door 2 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Self-tuning bandits over unknown covariate-shifts

J Suk, S Kpotufe - Algorithmic Learning Theory, 2021 - proceedings.mlr.press

Bandits with covariates, aka\emph {contextual bandits}, address situations where optimal
actions (or arms) at a given time $ t $, depend on a\emph {context} $ x_t $, eg, a new …

Opslaan Citeren Geciteerd door 13 Verwante artikelen Alle 4 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Nonparametric stochastic contextual bandits

Model selection in contextual stochastic bandit problems

Learning personalized decision support policies

Tracking most significant shifts in nonparametric contextual bandits

Dynamic contextual pricing with doubly non-parametric random utility models

Unifying offline causal inference and online bandit learning for data driven decision

The role of contextual information in best arm identification

Adversarial rewards in universal learning for contextual bandits

Adaptive algorithm for multi-armed bandit problem with high-dimensional covariates

Thompson sampling in partially observable contextual bandits

Self-tuning bandits over unknown covariate-shifts