A survey on policy search algorithms for learning robot controllers in a handful of trials

K Chatzilygeroudis, V Vassiliades… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
Most policy search (PS) algorithms require thousands of training episodes to find an
effective policy, which is often infeasible with a physical robot. This survey article focuses on …

Mopo: Model-based offline policy optimization

T Yu, G Thomas, L Yu, S Ermon… - Advances in …, 2020 - proceedings.neurips.cc
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a
batch of previously collected data. This problem setting is compelling, because it offers the …

Rvs: What is essential for offline rl via supervised learning?

S Emmons, B Eysenbach, I Kostrikov… - arxiv preprint arxiv …, 2021 - arxiv.org
Recent work has shown that supervised learning alone, without temporal difference (TD)
learning, can be remarkably effective for offline RL. When does this hold true, and which …

When to trust your model: Model-based policy optimization

M Janner, J Fu, M Zhang… - Advances in neural …, 2019 - proceedings.neurips.cc
Designing effective model-based reinforcement learning algorithms is difficult because the
ease of data generation must be weighed against the bias of model-generated data. In this …

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

Recurrent world models facilitate policy evolution

D Ha, J Schmidhuber - Advances in neural information …, 2018 - proceedings.neurips.cc
A generative recurrent neural network is quickly trained in an unsupervised manner to
model popular reinforcement learning environments through compressed spatio-temporal …

Deep reinforcement learning in a handful of trials using probabilistic dynamics models

K Chua, R Calandra, R McAllister… - Advances in neural …, 2018 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL) algorithms can attain excellent sample
efficiency, but often lag behind the best model-free algorithms in terms of asymptotic …

[PDF][PDF] Uncertainty in deep learning

Y Gal - 2016 - 106.54.215.74
PowerPoint 演示文稿 Page 1 Uncertainty in Deep Learning Yarin Gal 2018.7.29 Page 2 Page
3 Different Uncertainties Two main types of uncertainty, often confused by practitioners, but …

Model-ensemble trust-region policy optimization

T Kurutach, I Clavera, Y Duan, A Tamar… - arxiv preprint arxiv …, 2018 - arxiv.org
Model-free reinforcement learning (RL) methods are succeeding in a growing number of
tasks, aided by recent advances in deep learning. However, they tend to suffer from high …

Sample-efficient reinforcement learning with stochastic ensemble value expansion

J Buckman, D Hafner, G Tucker… - Advances in neural …, 2018 - proceedings.neurips.cc
There is growing interest in combining model-free and model-based approaches in
reinforcement learning with the goal of achieving the high performance of model-free …