Multi-agent best arm identification with private communications

A Rio, M Barlier, I Colin… - … Conference on Machine …, 2023 - proceedings.mlr.press
We address multi-agent best arm identification with privacy guarantees. In this setting,
agents collaborate by communicating to find the optimal arm. To avoid leaking sensitive data …

On-demand communication for asynchronous multi-agent bandits

YZJ Chen, L Yang, X Wang, X Liu… - International …, 2023 - proceedings.mlr.press
This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where
agents operate asynchronously–agent pull times and rates are unknown, irregular, and …

Multitask bandit learning through heterogeneous feedback aggregation

Z Wang, C Zhang, MK Singh, L Riek… - International …, 2021 - proceedings.mlr.press
In many real-world applications, multiple agents seek to learn how to perform highly related
yet slightly different tasks in an online bandit learning protocol. We formulate this problem as …

Safe policy improvement with an estimated baseline policy

TD Simão, R Laroche, RT Combes - arxiv preprint arxiv:1909.05236, 2019 - arxiv.org
Previous work has shown the unreliability of existing algorithms in the batch Reinforcement
Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with …

Heterogeneous explore-exploit strategies on multi-star networks

U Madhushani, NE Leonard - 2021 American Control …, 2021 - ieeexplore.ieee.org
We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making
where the goal of the agents is to maximize cumulative group reward. To do so we study a …

Cooperative multi-agent bandits: Distributed algorithms with optimal individual regret and constant communication costs

L Yang, X Wang, M Hajiesmaili, L Zhang, J Lui… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, there has been extensive study of cooperative multi-agent multi-armed bandits
where a set of distributed agents cooperatively play the same multi-armed bandit game. The …

Optimal Learning Policies for Differential Privacy in Multi-armed Bandits

S Wang, J Zhu - Journal of Machine Learning Research, 2024 - jmlr.org
This paper studies the multi-armed bandit problem with a requirement of differential privacy
guarantee or global differential privacy guarantee. We first prove that, the lower bound for …

Massive multi-player multi-armed bandits for IoT networks: An application on LoRa networks

H Dakdouk, R Féraud, N Varsier, P Maillé, R Laroche - Ad Hoc Networks, 2023 - Elsevier
More and more manufacturers, as part of the transition towards Industry 4.0, are using
Internet of Things (IoT) networks for more efficient production. The wide and extensive …

Online learning for cooperative multi-player multi-armed bandits

W Chang, M Jafarnia-Jahromi… - 2022 IEEE 61st …, 2022 - ieeexplore.ieee.org
We introduce a framework for decentralized on-line learning for multi-armed bandits (MAB)
with multiple cooperative players, where the reward obtained by the players each round …

Secure Protocols for Best Arm Identification in Federated Stochastic Multi-Armed Bandits

R Ciucanu, A Delabrouille… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The stochastic multi-armed bandit is a classical reinforcement learning model, where a
learning agent sequentially chooses an action (pull a bandit arm) and the environment …