Fusion dynamical systems with machine learning in imitation learning: A comprehensive overview

Y Hu, FJ Abu-Dakka, F Chen, X Luo, Z Li, A Knoll… - Information …, 2024 - Elsevier
Imitation Learning (IL), also referred to as Learning from Demonstration (LfD), holds
significant promise for capturing expert motor skills through efficient imitation, facilitating …

One pixel attack for fooling deep neural networks

J Su, DV Vargas, K Sakurai - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Recent research has revealed that the output of deep neural networks (DNNs) can be easily
altered by adding relatively small perturbations to the input vector. In this paper, we analyze …

Maximum a posteriori policy optimisation

A Abdolmaleki, JT Springenberg, Y Tassa… - arxiv preprint arxiv …, 2018 - arxiv.org
We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy
Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show …

V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control

HF Song, A Abdolmaleki, JT Springenberg… - arxiv preprint arxiv …, 2019 - arxiv.org
Some of the most successful applications of deep reinforcement learning to challenging
domains in discrete and continuous control have used policy gradient methods in the on …

Evolution strategies for continuous optimization: A survey of the state-of-the-art

Z Li, X Lin, Q Zhang, H Liu - Swarm and Evolutionary Computation, 2020 - Elsevier
Evolution strategies are a class of evolutionary algorithms for black-box optimization and
achieve state-of-the-art performance on many benchmarks and real-world applications …

Variational inference mpc for bayesian model-based reinforcement learning

M Okada, T Taniguchi - Conference on robot learning, 2020 - proceedings.mlr.press
In recent studies on model-based reinforcement learning (MBRL), incorporating uncertainty
in forward dynamics is a state-of-the-art strategy to enhance learning performance, making …

PPO-CMA: Proximal policy optimization with covariance matrix adaptation

P Hämäläinen, A Babadi, X Ma… - 2020 IEEE 30th …, 2020 - ieeexplore.ieee.org
Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning
(RL) approach. However, we observe that in a continuous action space, PPO can …

Relative entropy regularized policy iteration

A Abdolmaleki, JT Springenberg, J Degrave… - arxiv preprint arxiv …, 2018 - arxiv.org
We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that
combines ideas from gradient-free optimization via stochastic search with learned action …

Entropic risk measure in policy search

D Nass, B Belousov, J Peters - 2019 IEEE/RSJ International …, 2019 - ieeexplore.ieee.org
With the increasing pace of automation, modern robotic systems need to act in stochastic,
non-stationary, partially observable environments. A range of algorithms for finding …

High acceleration reinforcement learning for real-world juggling with binary rewards

K Ploeger, M Lutter, J Peters - Conference on Robot …, 2021 - proceedings.mlr.press
Robots that can learn in the physical world will be important to enable robots to escape their
stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as …