Feel-good thompson sampling for contextual bandits and reinforcement learning
T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM
Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …
flexibility of its modeling power. However, a general theory for this class of methods in the …
Making rl with preference-based feedback efficient via randomization
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
A self-play posterior sampling algorithm for zero-sum markov games
Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively
build on the “optimism in the face of uncertainty”(OFU) principle. This work focuses on a …
build on the “optimism in the face of uncertainty”(OFU) principle. This work focuses on a …
A provably efficient model-free posterior sampling method for episodic reinforcement learning
Thompson Sampling is one of the most effective methods for contextual bandits and has
been generalized to posterior sampling for certain MDP settings. However, existing posterior …
been generalized to posterior sampling for certain MDP settings. However, existing posterior …
Posterior sampling with delayed feedback for reinforcement learning with linear function approximation
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …
function approximation to alleviate the sample complexity hurdle for better performance …
Randomized exploration in reinforcement learning with general value function approximation
We propose a model-free reinforcement learning algorithm inspired by the popular
randomized least squares value iteration (RLSVI) algorithm as well as the optimism …
randomized least squares value iteration (RLSVI) algorithm as well as the optimism …
Towards deployment-efficient reinforcement learning: Lower bound and optimality
Deployment efficiency is an important criterion for many real-world applications of
reinforcement learning (RL). Despite the community's increasing interest, there lacks a …
reinforcement learning (RL). Despite the community's increasing interest, there lacks a …
Nonstationary reinforcement learning with linear function approximation
We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs)
with linear function approximation under drifting environment. Specifically, both the reward …
with linear function approximation under drifting environment. Specifically, both the reward …
Optimistic Thompson sampling-based algorithms for episodic reinforcement learning
Abstract We propose two Thompson Sampling-like, model-based learning algorithms for
episodic Markov decision processes (MDPs) with a finite time horizon. Our proposed …
episodic Markov decision processes (MDPs) with a finite time horizon. Our proposed …
Dyadic Reinforcement Learning
Mobile health aims to enhance health outcomes by delivering interventions to individuals as
they go about their daily life. The involvement of care partners and social support networks …
they go about their daily life. The involvement of care partners and social support networks …