Efficient and robust sequential decision making algorithms
P Xu - AI Magazine, 2024 - Wiley Online Library
Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …
interactions with a complex environment. This process is ubiquitous in various applications …
Kullback-leibler maillard sampling for multi-armed bandits with bounded rewards
We study $ K $-armed bandit problems where the reward distributions of the arms are all
supported on the $[0, 1] $ interval. Maillard sampling\cite {maillard13apprentissage}, an …
supported on the $[0, 1] $ interval. Maillard sampling\cite {maillard13apprentissage}, an …
A general recipe for the analysis of randomized multi-armed bandit algorithms
In this paper we propose a general methodology to derive regret bounds for randomized
multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the …
multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the …
Monte-Carlo tree search with uncertainty propagation via optimal transport
This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS)
designed for highly stochastic and partially observable Markov decision processes. We …
designed for highly stochastic and partially observable Markov decision processes. We …
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning
We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …
agent reinforcement learning (MARL). We propose a unified algorithm framework for …
[HTML][HTML] Thompson Sampling for Non-Stationary Bandit Problems
H Qi, F Guo, L Zhu - Entropy, 2025 - mdpi.com
Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive
attention. We focus on the abruptly changing scenario where reward distributions remain …
attention. We focus on the abruptly changing scenario where reward distributions remain …
Zero-Inflated Bandits
Many real applications of bandits have sparse non-zero rewards, leading to slow learning
rates. A careful distribution modeling that utilizes problem-specific structures is known as …
rates. A careful distribution modeling that utilizes problem-specific structures is known as …