High-dimensional continuous control using generalized advantage estimation

J Schulman, P Moritz, S Levine, M Jordan… - ar**_Policy_Optimization/links/674fb7dd876bd177783b0769/Graph-Attention-Based-Casual-Discovery-With-Trust-Region-Navigated-Clip**-Policy-Optimization.pdf" data-clk="hl=ko&sa=T&oi=gga&ct=gga&cd=2&d=4215501129336400677&ei=Hb6wZ-HCIZ-bieoPz4O4mAk" data-clk-atid="JataRPF3gDoJ" target="_blank">[PDF] researchgate.net

[PDF][PDF] Trust Region Policy Optimization

J Schulman - arxiv preprint arxiv:1502.05477, 2015 - researchgate.net
In this article, we describe a method for optimizing control policies, with guaranteed
monotonic improvement. By making several approximations to the theoretically-justified …

Deep spatial autoencoders for visuomotor learning

C Finn, XY Tan, Y Duan, T Darrell… - … on Robotics and …, 2016 - ieeexplore.ieee.org
Reinforcement learning provides a powerful and flexible framework for automated
acquisition of robotic motion skills. However, applying reinforcement learning requires a …

Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search

T Zhang, G Kahn, S Levine… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
Model predictive control (MPC) is an effective method for controlling robotic systems,
particularly autonomous aerial vehicles such as quadcopters. However, application of MPC …

Reinforcement learning in robotics: A survey

J Kober, JA Bagnell, J Peters - The International Journal of …, 2013 - journals.sagepub.com
Reinforcement learning offers to robotics a framework and set of tools for the design of
sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic …

Guided policy search

S Levine, V Koltun - International conference on machine …, 2013 - proceedings.mlr.press
Direct policy search can effectively scale to high-dimensional systems, but complex policies
with hundreds of parameters often present a challenge for such methods, requiring …

The optimal sample complexity of PAC learning

S Hanneke - Journal of Machine Learning Research, 2016 - jmlr.org
Policy search methods can allow robots to learn control policies for a wide range of tasks,
but practical applications of policy search often require hand-engineered components for …

Deep reinforcement learning for tensegrity robot locomotion

M Zhang, X Geng, J Bruce, K Caluwaerts… - … on robotics and …, 2017 - ieeexplore.ieee.org
Tensegrity robots, composed of rigid rods connected by elastic cables, have a number of
unique properties that make them appealing for use as planetary exploration rovers …

Optimizing expectations: From deep reinforcement learning to stochastic computation graphs

J Schulman - 2016 - escholarship.org
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization
problem: maximize the expected total reward with respect to the parameters of the policy …