Exploration in deep reinforcement learning: A survey

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier
This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Provably efficient reinforcement learning with linear function approximation

C **, Z Yang, Z Wang… - Conference on learning …, 2020 - proceedings.mlr.press
Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

Is Q-learning provably efficient?

C **, Z Allen-Zhu, S Bubeck… - Advances in neural …, 2018 - proceedings.neurips.cc
Abstract Model-free reinforcement learning (RL) algorithms directly parameterize and
update value functions or policies, bypassing the modeling of the environment. They are …

Noisy networks for exploration

M Fortunato, MG Azar, B Piot, J Menick… - arxiv preprint arxiv …, 2017 - arxiv.org
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to
its weights, and show that the induced stochasticity of the agent's policy can be used to aid …

Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning

K Lee, M Laskin, A Srinivas… - … Conference on Machine …, 2021 - proceedings.mlr.press
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Parameter space noise for exploration

M Plappert, R Houthooft, P Dhariwal, S Sidor… - arxiv preprint arxiv …, 2017 - arxiv.org
Deep reinforcement learning (RL) methods generally engage in exploratory behavior
through noise injection in the action space. An alternative is to add noise directly to the …

Randomized prior functions for deep reinforcement learning

I Osband, J Aslanides… - Advances in neural …, 2018 - proceedings.neurips.cc
Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing
literature on uncertainty estimation for deep learning from fixed datasets, but many of the …