Made: Exploration via maximizing deviation from explored regions

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc
In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

Diversity policy gradient for sample efficient quality-diversity optimization

T Pierrot, V Macé, F Chalumeau, A Flajolet… - Proceedings of the …, 2022 - dl.acm.org
A fascinating aspect of nature lies in its ability to produce a large and diverse collection of
organisms that are all high-performing in their niche. By contrast, most AI algorithms focus …

Geometric entropic exploration

ZD Guo, MG Azar, A Saade, S Thakoor, B Piot… - arxiv preprint arxiv …, 2021 - arxiv.org
Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum
State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy …

Seizing serendipity: Exploiting the value of past success in off-policy actor-critic

T Ji, Y Luo, F Sun, X Zhan, J Zhang, H Xu - arxiv preprint arxiv …, 2023 - arxiv.org
Learning high-quality $ Q $-value functions plays a key role in the success of many modern
off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on …

[PDF][PDF] Qd-rl: Efficient mixing of quality and diversity in reinforcement learning

G Cideron, T Pierrot, N Perrin, K Beguir… - arxiv preprint arxiv …, 2020 - researchgate.net
We propose a novel reinforcement learning algorithm, QD-RL, that incorporates the
strengths of off-policy RL algorithms into Quality Diversity (QD) approaches. Quality-Diversity …

Reinforcement learning by guided safe exploration

Q Yang, TD Simão, N Jansen, SH Tindemans… - ECAI 2023, 2023 - ebooks.iospress.nl
Safety is critical to broadening the application of reinforcement learning (RL). Often, we train
RL agents in a controlled environment, such as a laboratory, before deploying them in the …

Variational curriculum reinforcement learning for unsupervised discovery of skills

S Kim, K Lee, J Choi - arxiv preprint arxiv:2310.19424, 2023 - arxiv.org
Mutual information-based reinforcement learning (RL) has been proposed as a promising
framework for retrieving complex skills autonomously without a task-oriented reward function …

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arxiv preprint arxiv:2305.06851, 2023 - arxiv.org
Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

[PDF][PDF] Training and transferring safe policies in reinforcement learning

Q Yang, T Simão, N Jansen, S Tindemans, M Spaan - 2022 - repository.ubn.ru.nl
Safety is critical to broadening the application of reinforcement learning (RL). Often, RL
agents are trained in a controlled environment, such as a laboratory, before being deployed …

Behind the myth of exploration in policy gradients

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2402.00162, 2024 - arxiv.org
Policy-gradient algorithms are effective reinforcement learning methods for solving control
problems with continuous state and action spaces. To compute near-optimal policies, it is …