A simple and provably efficient algorithm for asynchronous federated contextual linear bandits

J He, T Wang, Y Min, Q Gu - Advances in neural information …, 2022 - proceedings.neurips.cc
We study federated contextual linear bandits, where $ M $ agents cooperate with each other
to solve a global contextual linear bandit problem with the help of a central server. We …

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

An exponential lower bound for linearly realizable mdp with constant suboptimality gap

Y Wang, R Wang, S Kakade - Advances in Neural …, 2021 - proceedings.neurips.cc
A fundamental question in the theory of reinforcement learning is: suppose the optimal $ Q $-
function lies in the linear span of a given $ d $ dimensional feature map**, is sample …

Provably efficient reinforcement learning with linear function approximation under adaptivity constraints

T Wang, D Zhou, Q Gu - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We study reinforcement learning (RL) with linear function approximation under the adaptivity
constraint. We consider two popular limited adaptivity models: the batch learning model and …

Impact of representation learning in linear bandits

J Yang, W Hu, JD Lee, SS Du - arxiv preprint arxiv:2010.06531, 2020 - arxiv.org
We study how representation learning can improve the efficiency of bandit problems. We
study the setting where we play $ T $ linear bandits with dimension $ d $ concurrently, and …

Near-optimal regret bounds for multi-batch reinforcement learning

Z Zhang, Y Jiang, Y Zhou, X Ji - Advances in Neural …, 2022 - proceedings.neurips.cc
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …

Efficient batched algorithm for contextual linear bandits with large action space via soft elimination

O Hanna, L Yang, C Fragouli - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we provide the first efficient batched algorithm for contextual linear bandits with
large action spaces. Unlike existing batched algorithms that rely on action elimination, which …

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …