- Academic Search

J Schmidhuber - Neural networks, 2015 - Elsevier

In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical survey …

Save Cite Cited by 24074 Related articles All 42 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Policy gradient methods for reinforcement learning with function approximation

RS Sutton, D McAllester, S Singh… - Advances in neural …, 1999 - proceedings.neurips.cc

Function approximation is essential to reinforcement learning, but the standard approach of
approximating a value function and deter (cid: 173) mining a policy from it has so far proven …

Save Cite Cited by 9165 Related articles All 35 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Fully decentralized multi-agent reinforcement learning with networked agents

K Zhang, Z Yang, H Liu, T Zhang… - … conference on machine …, 2018 - proceedings.mlr.press

We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …

Save Cite Cited by 739 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] researchgate.net

Survey of model-based reinforcement learning: Applications on robotics

AS Polydoros, L Nalpantidis - Journal of Intelligent & Robotic Systems, 2017 - Springer

Reinforcement learning is an appealing approach for allowing robots to learn new tasks.
Relevant literature reveals a plethora of methods, but at the same time makes clear the lack …

Save Cite Cited by 696 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Save Cite Cited by 321 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

A natural policy gradient

SM Kakade - Advances in neural information processing …, 2001 - proceedings.neurips.cc

We provide a natural gradient method that represents the steepest descent direction based
on the underlying structure of the param (cid: 173) eter space. Although gradient methods …

Save Cite Cited by 1647 Related articles All 21 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] nowpublishers.com

A tutorial on linear function approximators for dynamic programming and reinforcement learning

A Geramifard, TJ Walsh, S Tellex… - … and Trends® in …, 2013 - nowpublishers.com

Abstract A Markov Decision Process (MDP) is a natural framework for formulating sequential
decision-making problems under uncertainty. In recent years, researchers have greatly …

Save Cite Cited by 168 Related articles All 8 versions Free GPT-4 Library Search View as HTML

[Free GPT-4]

[PDF] arxiv.org

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - arxiv preprint arxiv:1909.01150, 2019 - arxiv.org

Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …

Save Cite Cited by 270 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Decoupled neural interfaces using synthetic gradients

M Jaderberg, WM Czarnecki… - International …, 2017 - proceedings.mlr.press

Training directed neural networks typically requires forward-propagating data through a
computation graph, followed by backpropagating error signal, to produce weight updates. All …

Save Cite Cited by 446 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] berkeley.edu

Policy gradient methods for robotics

J Peters, S Schaal - 2006 IEEE/RSJ international conference …, 2006 - ieeexplore.ieee.org

The acquisition and improvement of motor skills and control policies for robotics from trial
and error is of essential importance if robots should ever leave precisely pre-structured …

Save Cite Cited by 782 Related articles All 19 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Direct gradient-based reinforcement learning

Deep learning in neural networks: An overview

Policy gradient methods for reinforcement learning with function approximation

Fully decentralized multi-agent reinforcement learning with networked agents

Survey of model-based reinforcement learning: Applications on robotics

Provably efficient exploration in policy optimization

A natural policy gradient

A tutorial on linear function approximators for dynamic programming and reinforcement learning

Neural policy gradient methods: Global optimality and rates of convergence

Decoupled neural interfaces using synthetic gradients

Policy gradient methods for robotics