Deep learning in neural networks: An overview
J Schmidhuber - Neural networks, 2015 - Elsevier
In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical survey …
numerous contests in pattern recognition and machine learning. This historical survey …
Policy gradient methods for reinforcement learning with function approximation
Function approximation is essential to reinforcement learning, but the standard approach of
approximating a value function and deter (cid: 173) mining a policy from it has so far proven …
approximating a value function and deter (cid: 173) mining a policy from it has so far proven …
Fully decentralized multi-agent reinforcement learning with networked agents
We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …
where the agents are connected via a time-varying and possibly sparse communication …
Survey of model-based reinforcement learning: Applications on robotics
Reinforcement learning is an appealing approach for allowing robots to learn new tasks.
Relevant literature reveals a plethora of methods, but at the same time makes clear the lack …
Relevant literature reveals a plethora of methods, but at the same time makes clear the lack …
Provably efficient exploration in policy optimization
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …
it is significantly less understood in theory, especially compared with value-based RL. In …
A natural policy gradient
SM Kakade - Advances in neural information processing …, 2001 - proceedings.neurips.cc
We provide a natural gradient method that represents the steepest descent direction based
on the underlying structure of the param (cid: 173) eter space. Although gradient methods …
on the underlying structure of the param (cid: 173) eter space. Although gradient methods …
A tutorial on linear function approximators for dynamic programming and reinforcement learning
Abstract A Markov Decision Process (MDP) is a natural framework for formulating sequential
decision-making problems under uncertainty. In recent years, researchers have greatly …
decision-making problems under uncertainty. In recent years, researchers have greatly …
Neural policy gradient methods: Global optimality and rates of convergence
Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …
successes, especially when the actors and critics are parameterized by neural networks …
Decoupled neural interfaces using synthetic gradients
Training directed neural networks typically requires forward-propagating data through a
computation graph, followed by backpropagating error signal, to produce weight updates. All …
computation graph, followed by backpropagating error signal, to produce weight updates. All …
Policy gradient methods for robotics
The acquisition and improvement of motor skills and control policies for robotics from trial
and error is of essential importance if robots should ever leave precisely pre-structured …
and error is of essential importance if robots should ever leave precisely pre-structured …