A tutorial on sparse Gaussian processes and variational inference

F Leibfried, V Dutordoir, ST John… - arxiv preprint arxiv …, 2020 - arxiv.org
Gaussian processes (GPs) provide a framework for Bayesian inference that can offer
principled uncertainty estimates for a large range of problems. For example, if we consider …

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

[HTML][HTML] Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Y Tsurumine, Y Cui, E Uchibe, T Matsubara - Robotics and Autonomous …, 2019 - Elsevier
Abstract Deep Reinforcement Learning (DRL), which can learn complex policies with high-
dimensional observations as inputs, eg, images, has been successfully applied to various …

On stochastic optimal control and reinforcement learning by approximate inference

K Rawlik, M Toussaint, S Vijayakumar - 2013 - direct.mit.edu
We present a reformulation of the stochastic optimal control problem in terms of KL
divergence minimisation, not only providing a unifying perspective of previous approaches …

Cautiously optimistic policy optimization and exploration with linear function approximation

A Zanette, CA Cheng… - Conference on Learning …, 2021 - proceedings.mlr.press
Policy optimization methods are popular reinforcement learning (RL) algorithms, because
their incremental and on-policy nature makes them more stable than the value-based …

Trust-pcl: An off-policy trust region method for continuous control

O Nachum, M Norouzi, K Xu, D Schuurmans - arxiv preprint arxiv …, 2017 - arxiv.org
Trust region methods, such as TRPO, are often used to stabilize policy optimization
algorithms in reinforcement learning (RL). While current trust region strategies are effective …

[HTML][HTML] Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process

L Zhu, Y Cui, G Takami, H Kanokogi… - Control Engineering …, 2020 - Elsevier
This paper explores a reinforcement learning (RL) approach that designs automatic control
strategies in a large-scale chemical process control scenario as the first step for leveraging …

Goal-aware generative adversarial imitation learning from imperfect demonstration for robotic cloth manipulation

Y Tsurumine, T Matsubara - Robotics and Autonomous Systems, 2022 - Elsevier
Abstract Generative Adversarial Imitation Learning (GAIL) can learn policies without
explicitly defining the reward function from demonstrations. GAIL has the potential to learn …

A unified bellman optimality principle combining reward maximization and empowerment

F Leibfried, S Pascual-Diaz… - Advances in Neural …, 2019 - proceedings.neurips.cc
Empowerment is an information-theoretic method that can be used to intrinsically motivate
learning agents. It attempts to maximize an agent's control over the environment by …