- Academic Search

T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM

Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …

Save Cite Cited by 72 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arxiv preprint arxiv:2310.14554, 2023 - arxiv.org

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

Save Cite Cited by 23 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A self-play posterior sampling algorithm for zero-sum markov games

W **ong, H Zhong, C Shi, C Shen… - … on Machine Learning, 2022 - proceedings.mlr.press

Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively
build on the “optimism in the face of uncertainty”(OFU) principle. This work focuses on a …

Save Cite Cited by 28 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A provably efficient model-free posterior sampling method for episodic reinforcement learning

C Dann, M Mohri, T Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc

Thompson Sampling is one of the most effective methods for contextual bandits and has
been generalized to posterior sampling for certain MDP settings. However, existing posterior …

Save Cite Cited by 44 Related articles All 9 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Save Cite Cited by 8 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Randomized exploration in reinforcement learning with general value function approximation

H Ishfaq, Q Cui, V Nguyen, A Ayoub… - International …, 2021 - proceedings.mlr.press

We propose a model-free reinforcement learning algorithm inspired by the popular
randomized least squares value iteration (RLSVI) algorithm as well as the optimism …

Save Cite Cited by 46 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards deployment-efficient reinforcement learning: Lower bound and optimality

J Huang, J Chen, L Zhao, T Qin, N Jiang… - arxiv preprint arxiv …, 2022 - arxiv.org

Deployment efficiency is an important criterion for many real-world applications of
reinforcement learning (RL). Despite the community's increasing interest, there lacks a …

Save Cite Cited by 27 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nonstationary reinforcement learning with linear function approximation

H Zhou, J Chen, LR Varshney, A Jagmohan - arxiv preprint arxiv …, 2020 - arxiv.org

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs)
with linear function approximation under drifting environment. Specifically, both the reward …

Save Cite Cited by 45 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimistic Thompson sampling-based algorithms for episodic reinforcement learning

B Hu, TH Zhang, N Hegde… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Abstract We propose two Thompson Sampling-like, model-based learning algorithms for
episodic Markov decision processes (MDPs) with a finite time horizon. Our proposed …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dyadic Reinforcement Learning

S Li, LS Niell, SW Choi, I Nahum-Shani… - arxiv preprint arxiv …, 2023 - arxiv.org

Mobile health aims to enhance health outcomes by delivering interventions to individuals as
they go about their daily life. The involvement of care partners and social support networks …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

Improved worst-case regret bounds for randomized least-squares value iteration

Feel-good thompson sampling for contextual bandits and reinforcement learning

Making rl with preference-based feedback efficient via randomization

A self-play posterior sampling algorithm for zero-sum markov games

A provably efficient model-free posterior sampling method for episodic reinforcement learning

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

Randomized exploration in reinforcement learning with general value function approximation

Towards deployment-efficient reinforcement learning: Lower bound and optimality

Nonstationary reinforcement learning with linear function approximation

Optimistic Thompson sampling-based algorithms for episodic reinforcement learning

Dyadic Reinforcement Learning