A tutorial on sparse Gaussian processes and variational inference
Gaussian processes (GPs) provide a framework for Bayesian inference that can offer
principled uncertainty estimates for a large range of problems. For example, if we consider …
principled uncertainty estimates for a large range of problems. For example, if we consider …
Bridging the gap between value and policy based reinforcement learning
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …
(RL) based on a relationship between softmax temporal value consistency and policy …
A unified view of entropy-regularized markov decision processes
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …
learning in Markov decision processes (MDPs). Our approach is based on extending the …
[HTML][HTML] Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
Abstract Deep Reinforcement Learning (DRL), which can learn complex policies with high-
dimensional observations as inputs, eg, images, has been successfully applied to various …
dimensional observations as inputs, eg, images, has been successfully applied to various …
On stochastic optimal control and reinforcement learning by approximate inference
We present a reformulation of the stochastic optimal control problem in terms of KL
divergence minimisation, not only providing a unifying perspective of previous approaches …
divergence minimisation, not only providing a unifying perspective of previous approaches …
Cautiously optimistic policy optimization and exploration with linear function approximation
Policy optimization methods are popular reinforcement learning (RL) algorithms, because
their incremental and on-policy nature makes them more stable than the value-based …
their incremental and on-policy nature makes them more stable than the value-based …
Trust-pcl: An off-policy trust region method for continuous control
Trust region methods, such as TRPO, are often used to stabilize policy optimization
algorithms in reinforcement learning (RL). While current trust region strategies are effective …
algorithms in reinforcement learning (RL). While current trust region strategies are effective …
[HTML][HTML] Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process
This paper explores a reinforcement learning (RL) approach that designs automatic control
strategies in a large-scale chemical process control scenario as the first step for leveraging …
strategies in a large-scale chemical process control scenario as the first step for leveraging …
Goal-aware generative adversarial imitation learning from imperfect demonstration for robotic cloth manipulation
Abstract Generative Adversarial Imitation Learning (GAIL) can learn policies without
explicitly defining the reward function from demonstrations. GAIL has the potential to learn …
explicitly defining the reward function from demonstrations. GAIL has the potential to learn …
A unified bellman optimality principle combining reward maximization and empowerment
F Leibfried, S Pascual-Diaz… - Advances in Neural …, 2019 - proceedings.neurips.cc
Empowerment is an information-theoretic method that can be used to intrinsically motivate
learning agents. It attempts to maximize an agent's control over the environment by …
learning agents. It attempts to maximize an agent's control over the environment by …