Deep learning in neural networks: An overview

J Schmidhuber - Neural networks, 2015 - Elsevier
In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical survey …

Policy gradient methods for reinforcement learning with function approximation

RS Sutton, D McAllester, S Singh… - Advances in neural …, 1999 - proceedings.neurips.cc
Function approximation is essential to reinforcement learning, but the standard approach of
approximating a value function and deter (cid: 173) mining a policy from it has so far proven …

Fully decentralized multi-agent reinforcement learning with networked agents

K Zhang, Z Yang, H Liu, T Zhang… - … conference on machine …, 2018 - proceedings.mlr.press
We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …

Survey of model-based reinforcement learning: Applications on robotics

AS Polydoros, L Nalpantidis - Journal of Intelligent & Robotic Systems, 2017 - Springer
Reinforcement learning is an appealing approach for allowing robots to learn new tasks.
Relevant literature reveals a plethora of methods, but at the same time makes clear the lack …

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

A natural policy gradient

SM Kakade - Advances in neural information processing …, 2001 - proceedings.neurips.cc
We provide a natural gradient method that represents the steepest descent direction based
on the underlying structure of the param (cid: 173) eter space. Although gradient methods …

A tutorial on linear function approximators for dynamic programming and reinforcement learning

A Geramifard, TJ Walsh, S Tellex… - … and Trends® in …, 2013 - nowpublishers.com
Abstract A Markov Decision Process (MDP) is a natural framework for formulating sequential
decision-making problems under uncertainty. In recent years, researchers have greatly …

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - arxiv preprint arxiv:1909.01150, 2019 - arxiv.org
Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …

Decoupled neural interfaces using synthetic gradients

M Jaderberg, WM Czarnecki… - International …, 2017 - proceedings.mlr.press
Training directed neural networks typically requires forward-propagating data through a
computation graph, followed by backpropagating error signal, to produce weight updates. All …

Policy gradient methods for robotics

J Peters, S Schaal - 2006 IEEE/RSJ international conference …, 2006 - ieeexplore.ieee.org
The acquisition and improvement of motor skills and control policies for robotics from trial
and error is of essential importance if robots should ever leave precisely pre-structured …