Google Наука

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc

In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

Запазване Позоваване С позовавания в 51 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diversity policy gradient for sample efficient quality-diversity optimization

T Pierrot, V Macé, F Chalumeau, A Flajolet… - Proceedings of the …, 2022 - dl.acm.org

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of
organisms that are all high-performing in their niche. By contrast, most AI algorithms focus …

Запазване Позоваване С позовавания в 62 Сродни статии Всички 21 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Geometric entropic exploration

ZD Guo, MG Azar, A Saade, S Thakoor, B Piot… - arxiv preprint arxiv …, 2021 - arxiv.org

Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum
State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy …

Запазване Позоваване С позовавания в 43 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seizing serendipity: Exploiting the value of past success in off-policy actor-critic

T Ji, Y Luo, F Sun, X Zhan, J Zhang, H Xu - arxiv preprint arxiv …, 2023 - arxiv.org

Learning high-quality $ Q $-value functions plays a key role in the success of many modern
off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on …

Запазване Позоваване С позовавания в 14 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Qd-rl: Efficient mixing of quality and diversity in reinforcement learning

G Cideron, T Pierrot, N Perrin, K Beguir… - arxiv preprint arxiv …, 2020 - researchgate.net

We propose a novel reinforcement learning algorithm, QD-RL, that incorporates the
strengths of off-policy RL algorithms into Quality Diversity (QD) approaches. Quality-Diversity …

Запазване Позоваване С позовавания в 33 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] iospress.nl

Reinforcement learning by guided safe exploration

Q Yang, TD Simão, N Jansen, SH Tindemans… - ECAI 2023, 2023 - ebooks.iospress.nl

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train
RL agents in a controlled environment, such as a laboratory, before deploying them in the …

Запазване Позоваване С позовавания в 10 Сродни статии Всички 9 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Variational curriculum reinforcement learning for unsupervised discovery of skills

S Kim, K Lee, J Choi - arxiv preprint arxiv:2310.19424, 2023 - arxiv.org

Mutual information-based reinforcement learning (RL) has been proposed as a promising
framework for retrieving complex skills autonomously without a task-oriented reward function …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arxiv preprint arxiv:2305.06851, 2023 - arxiv.org

Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] ru.nl

[PDF][PDF] Training and transferring safe policies in reinforcement learning

Q Yang, T Simão, N Jansen, S Tindemans, M Spaan - 2022 - repository.ubn.ru.nl

Safety is critical to broadening the application of reinforcement learning (RL). Often, RL
agents are trained in a controlled environment, such as a laboratory, before being deployed …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 9 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Behind the myth of exploration in policy gradients

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2402.00162, 2024 - arxiv.org

Policy-gradient algorithms are effective reinforcement learning methods for solving control
problems with continuous state and action spaces. To compute near-optimal policies, it is …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 5 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Marginalized state distribution entropy regularization in policy optimization

Made: Exploration via maximizing deviation from explored regions

Diversity policy gradient for sample efficient quality-diversity optimization

Geometric entropic exploration

Seizing serendipity: Exploiting the value of past success in off-policy actor-critic

[PDF][PDF] Qd-rl: Efficient mixing of quality and diversity in reinforcement learning

Reinforcement learning by guided safe exploration

Variational curriculum reinforcement learning for unsupervised discovery of skills

Policy gradient algorithms implicitly optimize by continuation

[PDF][PDF] Training and transferring safe policies in reinforcement learning

Behind the myth of exploration in policy gradients