A simple and provably efficient algorithm for asynchronous federated contextual linear bandits
We study federated contextual linear bandits, where $ M $ agents cooperate with each other
to solve a global contextual linear bandit problem with the help of a central server. We …
to solve a global contextual linear bandit problem with the help of a central server. We …
Contextual bandits with large action spaces: Made practical
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …
and computationally efficient, yet support the use of flexible, general-purpose models …
Policy finetuning in reinforcement learning via design of experiments using offline data
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …
already availablebut it is also possible to acquire some additional online data to help …
An exponential lower bound for linearly realizable mdp with constant suboptimality gap
A fundamental question in the theory of reinforcement learning is: suppose the optimal $ Q $-
function lies in the linear span of a given $ d $ dimensional feature map**, is sample …
function lies in the linear span of a given $ d $ dimensional feature map**, is sample …
Provably efficient reinforcement learning with linear function approximation under adaptivity constraints
We study reinforcement learning (RL) with linear function approximation under the adaptivity
constraint. We consider two popular limited adaptivity models: the batch learning model and …
constraint. We consider two popular limited adaptivity models: the batch learning model and …
Impact of representation learning in linear bandits
We study how representation learning can improve the efficiency of bandit problems. We
study the setting where we play $ T $ linear bandits with dimension $ d $ concurrently, and …
study the setting where we play $ T $ linear bandits with dimension $ d $ concurrently, and …
Near-optimal regret bounds for multi-batch reinforcement learning
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
Efficient batched algorithm for contextual linear bandits with large action space via soft elimination
In this paper, we provide the first efficient batched algorithm for contextual linear bandits with
large action spaces. Unlike existing batched algorithms that rely on action elimination, which …
large action spaces. Unlike existing batched algorithms that rely on action elimination, which …
Experiment planning with function approximation
We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …
bandit problems. In settings where there is a significant overhead to deploying adaptive …
Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …
processes, where many agents cooperate via communication through a central server. We …