- Academic Search

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arxiv preprint arxiv …, 2023 - arxiv.org

We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

Spara Citera Citerat av 17 Relaterade artiklar Alla 6 versionerna Se som HTML-version

[Free GPT-4]

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

Spara Citera Citerat av 9 Relaterade artiklar Alla 10 versionerna Se som HTML-version

[Free GPT-4]

[PDF] mlr.press

Model-based uncertainty in value functions

CE Luis, AG Bottero, J Vinogradska… - International …, 2023 - proceedings.mlr.press

We consider the problem of quantifying uncertainty over expected cumulative rewards in
model-based reinforcement learning. In particular, we focus on characterizing the variance …

Spara Citera Citerat av 13 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]

[PDF] neurips.cc

Model-free posterior sampling via learning rate randomization

D Tiapkin, D Belomestny… - Advances in …, 2024 - proceedings.neurips.cc

In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-
free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the …

Spara Citera Citerat av 3 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]

[PDF] mlr.press

Posterior sampling for deep reinforcement learning

R Sasso, M Conserva… - … Conference on Machine …, 2023 - proceedings.mlr.press

Despite remarkable successes, deep reinforcement learning algorithms remain sample
inefficient: they require an enormous amount of trial and error to find good policies. Model …

Spara Citera Citerat av 10 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]

[PDF] mlr.press

Optimistic Thompson sampling-based algorithms for episodic reinforcement learning

B Hu, TH Zhang, N Hegde… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Abstract We propose two Thompson Sampling-like, model-based learning algorithms for
episodic Markov decision processes (MDPs) with a finite time horizon. Our proposed …

Spara Citera Citerat av 3 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

\textit{MinMaxMin} -learning

N Soffair, S Mannor - arxiv preprint arxiv:2402.05951, 2024 - arxiv.org

\textit {MinMaxMin} $ Q $-learning is a novel\textit {optimistic} Actor-Critic algorithm that
addresses the problem of\textit {overestimation} bias ($ Q $-estimations are overestimating …

Spara Citera Citerat av 1 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

A general recipe for the analysis of randomized multi-armed bandit algorithms

D Baudry, K Suzuki, J Honda - arxiv preprint arxiv:2303.06058, 2023 - arxiv.org

In this paper we propose a general methodology to derive regret bounds for randomized
multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the …

Spara Citera Citerat av 3 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] mathnet.ru

Обзор выпуклой оптимизации марковских процессов принятия решений

ВД Руденко, НЕ Юдин, АА Васин - Компьютерные исследования и …, 2023 - mathnet.ru

В данной статье проведен обзор как исторических достижений, так и современных
результатов в области марковских процессов принятия решений (Markov Decision …

Spara Citera Citerat av 1 Relaterade artiklar Alla 7 versionerna

Efficient and stable deep reinforcement learning: selective priority timing entropy

L Huo, J Mao, H San, S Zhang, R Li, L Fu - Applied Intelligence, 2024 - Springer

Deep reinforcement learning (DRL) has made significant strides in addressing tasks with
high-dimensional continuous action spaces. However, the field still faces the challenges of …

Spara Citera Relaterade artiklar Alla 2 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

From Dirichlet to Rubin: Optimistic exploration in RL without bonuses

Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

Model-based uncertainty in value functions

Model-free posterior sampling via learning rate randomization

Posterior sampling for deep reinforcement learning

Optimistic Thompson sampling-based algorithms for episodic reinforcement learning

\textit{MinMaxMin} -learning

A general recipe for the analysis of randomized multi-armed bandit algorithms

Обзор выпуклой оптимизации марковских процессов принятия решений

Efficient and stable deep reinforcement learning: selective priority timing entropy