A survey on policy search algorithms for learning robot controllers in a handful of trials
Most policy search (PS) algorithms require thousands of training episodes to find an
effective policy, which is often infeasible with a physical robot. This survey article focuses on …
effective policy, which is often infeasible with a physical robot. This survey article focuses on …
Mopo: Model-based offline policy optimization
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a
batch of previously collected data. This problem setting is compelling, because it offers the …
batch of previously collected data. This problem setting is compelling, because it offers the …
Rvs: What is essential for offline rl via supervised learning?
Recent work has shown that supervised learning alone, without temporal difference (TD)
learning, can be remarkably effective for offline RL. When does this hold true, and which …
learning, can be remarkably effective for offline RL. When does this hold true, and which …
When to trust your model: Model-based policy optimization
Designing effective model-based reinforcement learning algorithms is difficult because the
ease of data generation must be weighed against the bias of model-generated data. In this …
ease of data generation must be weighed against the bias of model-generated data. In this …
Model-based reinforcement learning: A survey
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …
optimization, is an important challenge in artificial intelligence. Two key approaches to this …
Recurrent world models facilitate policy evolution
A generative recurrent neural network is quickly trained in an unsupervised manner to
model popular reinforcement learning environments through compressed spatio-temporal …
model popular reinforcement learning environments through compressed spatio-temporal …
Deep reinforcement learning in a handful of trials using probabilistic dynamics models
Abstract Model-based reinforcement learning (RL) algorithms can attain excellent sample
efficiency, but often lag behind the best model-free algorithms in terms of asymptotic …
efficiency, but often lag behind the best model-free algorithms in terms of asymptotic …
[PDF][PDF] Uncertainty in deep learning
Y Gal - 2016 - 106.54.215.74
PowerPoint 演示文稿 Page 1 Uncertainty in Deep Learning Yarin Gal 2018.7.29 Page 2 Page
3 Different Uncertainties Two main types of uncertainty, often confused by practitioners, but …
3 Different Uncertainties Two main types of uncertainty, often confused by practitioners, but …
Model-ensemble trust-region policy optimization
Model-free reinforcement learning (RL) methods are succeeding in a growing number of
tasks, aided by recent advances in deep learning. However, they tend to suffer from high …
tasks, aided by recent advances in deep learning. However, they tend to suffer from high …
Sample-efficient reinforcement learning with stochastic ensemble value expansion
There is growing interest in combining model-free and model-based approaches in
reinforcement learning with the goal of achieving the high performance of model-free …
reinforcement learning with the goal of achieving the high performance of model-free …