- Academic Search

P Ménard, OD Domingues, X Shang… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new
algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic …

Save Cite Cited by 49 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

Save Cite Cited by 9 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Near instance-optimal pac reinforcement learning for deterministic mdps

A Tirinzoni, A Al Marjani… - Advances in neural …, 2022 - proceedings.neurips.cc

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to
identify an $\epsilon $-optimal policy with probability $1-\delta $. While minimax optimal …

Save Cite Cited by 20 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Model-based uncertainty in value functions

CE Luis, AG Bottero, J Vinogradska… - International …, 2023 - proceedings.mlr.press

We consider the problem of quantifying uncertainty over expected cumulative rewards in
model-based reinforcement learning. In particular, we focus on characterizing the variance …

Save Cite Cited by 13 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Model-free posterior sampling via learning rate randomization

D Tiapkin, D Belomestny… - Advances in …, 2024 - proceedings.neurips.cc

In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-
free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Value-distributional model-based reinforcement learning

CE Luis, AG Bottero, J Vinogradska… - Journal of Machine …, 2024 - jmlr.org

Quantifying uncertainty about a policy's long-term performance is important to solve
sequential decision-making tasks. We study the problem from a model-based Bayesian …

Save Cite Cited by 6 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Online policy optimization for robust mdp

J Dong, J Li, B Wang, J Zhang - arxiv preprint arxiv:2209.13841, 2022 - arxiv.org

Reinforcement learning (RL) has exceeded human performance in many synthetic settings
such as video games and Go. However, real-world deployment of end-to-end RL models is …

Save Cite Cited by 17 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Bandits corrupted by nature: Lower bounds on regret and robust optimistic algorithm

D Basu, OA Maillard, T Mathieu - arxiv preprint arxiv:2203.03186, 2022 - arxiv.org

We study the corrupted bandit problem, ie a stochastic multi-armed bandit problem with $ k $
unknown reward distributions, which are heavy-tailed and corrupted by a history …

Save Cite Cited by 8 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

C Qu, L Shi, K Panaganti, P You, A Wierman - arxiv preprint arxiv …, 2024 - arxiv.org

Online Reinforcement learning (RL) typically requires high-stakes online interaction data to
learn a policy for a target task. This prompts interest in leveraging historical data to improve …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization

CE Luis, AG Bottero, J Vinogradska… - arxiv preprint arxiv …, 2023 - arxiv.org

We consider the problem of quantifying uncertainty over expected cumulative rewards in
model-based reinforcement learning. In particular, we focus on characterizing the variance …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

rlberry-A Reinforcement Learning Library for Research and Education

Ucb momentum q-learning: Correcting the bias without forgetting

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

Near instance-optimal pac reinforcement learning for deterministic mdps

Model-based uncertainty in value functions

Model-free posterior sampling via learning rate randomization

Value-distributional model-based reinforcement learning

Online policy optimization for robust mdp

Bandits corrupted by nature: Lower bounds on regret and robust optimistic algorithm

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization