- Academic Search

P Xu - AI Magazine, 2024 - Wiley Online Library

Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …

保存引用関連記事全 3 バージョン

[Free GPT-4]

[PDF] neurips.cc

Kullback-leibler maillard sampling for multi-armed bandits with bounded rewards

H Qin, KS Jun, C Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We study $ K $-armed bandit problems where the reward distributions of the arms are all
supported on the $[0, 1] $ interval. Maillard sampling\cite {maillard13apprentissage}, an …

保存引用被引用数: 1 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A general recipe for the analysis of randomized multi-armed bandit algorithms

D Baudry, K Suzuki, J Honda - arxiv preprint arxiv:2303.06058, 2023 - arxiv.org

In this paper we propose a general methodology to derive regret bounds for randomized
multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the …

保存引用被引用数: 3 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Monte-Carlo tree search with uncertainty propagation via optimal transport

T Dam, P Stenger, L Schneider, J Pajarinen… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS)
designed for highly stochastic and partially observable Markov decision processes. We …

保存引用被引用数: 2 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

HL Hsu, W Wang, M Pajic, P Xu - arxiv preprint arxiv:2404.10728, 2024 - arxiv.org

We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …

保存引用被引用数: 6 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Thompson Sampling for Non-Stationary Bandit Problems

H Qi, F Guo, L Zhu - Entropy, 2025 - mdpi.com

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive
attention. We focus on the abruptly changing scenario where reward distributions remain …

保存引用関連記事全 6 バージョンキャッシュ

[Free GPT-4]

[PDF] arxiv.org

Zero-Inflated Bandits

H Wei, R Wan, L Shi, R Song - arxiv preprint arxiv:2312.15595, 2023 - arxiv.org

Many real applications of bandits have sparse non-zero rewards, leading to slow learning
rates. A careful distribution modeling that utilizes problem-specific structures is known as …

保存引用被引用数: 1 関連記事全 4 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

Thompson sampling with less exploration is fast and optimal

Efficient and robust sequential decision making algorithms

Kullback-leibler maillard sampling for multi-armed bandits with bounded rewards

A general recipe for the analysis of randomized multi-armed bandit algorithms

Monte-Carlo tree search with uncertainty propagation via optimal transport

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

[HTML][HTML] Thompson Sampling for Non-Stationary Bandit Problems

Zero-Inflated Bandits