Google Академія

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Зберегти Послатися Цитовано в 1273 джерелах Пов’язані статті Кількість версій: 7 Пошук бібліотеки Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Regret minimization with performative feedback

M Jagadeesan, T Zrnic… - … on Machine Learning, 2022 - proceedings.mlr.press

In performative prediction, the deployment of a predictive model triggers a shift in the data
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …

Зберегти Послатися Цитовано в 45 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Robust lipschitz bandits to adversarial corruptions

Y Kang, CJ Hsieh, TCM Lee - Advances in Neural …, 2023 - proceedings.neurips.cc

Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set
defined on a metric space, where the reward function is subject to a Lipschitz constraint. In …

Зберегти Послатися Цитовано в 11 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Multiobjective lipschitz bandits under lexicographic ordering

B Xue, J Cheng, F Liu, Y Wang, Q Zhang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …

Зберегти Послатися Цитовано в 2 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Zeroth-order non-convex learning via hierarchical dual averaging

A Héliou, M Martin, P Mertikopoulos… - … on Machine Learning, 2021 - proceedings.mlr.press

We propose a hierarchical version of dual averaging for zeroth-order online non-convex
optimization {–} ie, learning processes where, at each stage, the optimizer is facing an …

Зберегти Послатися Цитовано в 16 джерелах Пов’язані статті Кількість версій: 10 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols

X Dai, Z Wang, J Ye, JCS Lui - 2024 IEEE/ACM 32nd …, 2024 - ieeexplore.ieee.org

Optimizing network protocols is crucial for improving application performance. Recent
research works use multi-armed bandit (MAB) online learning methods to address network …

Зберегти Послатися Цитовано в 2 джерелах Пов’язані статті Кількість версій: 3

Intelligent informative frequency band searching assisted by a dynamic bandit tree method for machine fault diagnosis

Z Mo, Z Zhang, Q Miao, KL Tsui - IEEE/ASME Transactions on …, 2022 - ieeexplore.ieee.org

The fault informative frequency band searching is crucial to envelope analysis-based
machine fault diagnosis. Its success often depends on effective filters. However, existing …

Зберегти Послатися Цитовано в 6 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration

SP Parambath, C Anagnostopoulos… - Data Mining and …, 2025 - Springer

In this paper, we investigate Thompson sampling-based sequential block elimination
approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit …

Зберегти Послатися Пов’язані статті Кількість версій: 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

New Perspectives in Online Contract Design

S Zuo - arxiv preprint arxiv:2403.07143, 2024 - arxiv.org

This work studies the repeated principal-agent problem from an online learning perspective.
The principal's goal is to learn the optimal contract that maximizes her utility through …

Зберегти Послатися Цитовано в 1 джерелах Пов’язані статті Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mpg.de

[PDF][PDF] Online Defense Strategies for Reinforcement Learning Against Adaptive Reward Poisoning

A Nika, A Singla, G Radanovic - 26th International Conference on …, 2023 - pure.mpg.de

We consider the problem of defense against reward-poisoning attacks in reinforcement
learning and formulate it as a game in T rounds between a defender and an adaptive …

Зберегти Послатися Цитовано в 2 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Adaptive discretization for adversarial lipschitz bandits

Introduction to multi-armed bandits

Regret minimization with performative feedback

Robust lipschitz bandits to adversarial corruptions

Multiobjective lipschitz bandits under lexicographic ordering

Zeroth-order non-convex learning via hierarchical dual averaging

Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols

Intelligent informative frequency band searching assisted by a dynamic bandit tree method for machine fault diagnosis

Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration

New Perspectives in Online Contract Design

[PDF][PDF] Online Defense Strategies for Reinforcement Learning Against Adaptive Reward Poisoning