- Academic Search

W Wei, L Liu - ACM Computing Surveys, 2024 - dl.acm.org

Emerging Distributed AI systems are revolutionizing big data computing and data
processing capabilities with growing economic and societal impact. However, recent studies …

Speichern Zitieren Zitiert von: 16 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Speichern Zitieren Zitiert von: 205 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

H Liu, CY Wei, J Zimmert - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …

Speichern Zitieren Zitiert von: 13 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Feel-good thompson sampling for contextual bandits and reinforcement learning

T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM

Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …

Speichern Zitieren Zitiert von: 70 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] neurips.cc

Nearly optimal algorithms for linear contextual bandits with adversarial corruptions

J He, D Zhou, T Zhang, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …

Speichern Zitieren Zitiert von: 51 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

C Ye, W **ong, Q Gu, T Zhang - International Conference on …, 2023 - proceedings.mlr.press

Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …

Speichern Zitieren Zitiert von: 27 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press

A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

Speichern Zitieren Zitiert von: 39 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc

Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

Speichern Zitieren Zitiert von: 59 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Speichern Zitieren Zitiert von: 61 Ähnliche Artikel Alle 6 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Adapting to misspecification in contextual bandits

Trustworthy distributed ai systems: Robustness, privacy, and governance

The statistical complexity of interactive decision making

Corruption-robust offline reinforcement learning with general function approximation

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

Feel-good thompson sampling for contextual bandits and reinforcement learning

Nearly optimal algorithms for linear contextual bandits with adversarial corruptions

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

Contextual bandits with large action spaces: Made practical

Bayesian decision-making under misspecified priors with applications to meta-learning

A model selection approach for corruption robust reinforcement learning