High-dimensional continuous control using generalized advantage estimation
J Schulman, P Moritz, S Levine, M Jordan… - ar**_Policy_Optimization/links/674fb7dd876bd177783b0769/Graph-Attention-Based-Casual-Discovery-With-Trust-Region-Navigated-Clip**-Policy-Optimization.pdf" data-clk="hl=ko&sa=T&oi=gga&ct=gga&cd=2&d=4215501129336400677&ei=Hb6wZ-HCIZ-bieoPz4O4mAk" data-clk-atid="JataRPF3gDoJ" target="_blank">[PDF] researchgate.net
[PDF][PDF] Trust Region Policy Optimization
J Schulman - arxiv preprint arxiv:1502.05477, 2015 - researchgate.net
In this article, we describe a method for optimizing control policies, with guaranteed
monotonic improvement. By making several approximations to the theoretically-justified …
monotonic improvement. By making several approximations to the theoretically-justified …
Deep spatial autoencoders for visuomotor learning
Reinforcement learning provides a powerful and flexible framework for automated
acquisition of robotic motion skills. However, applying reinforcement learning requires a …
acquisition of robotic motion skills. However, applying reinforcement learning requires a …
Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search
Model predictive control (MPC) is an effective method for controlling robotic systems,
particularly autonomous aerial vehicles such as quadcopters. However, application of MPC …
particularly autonomous aerial vehicles such as quadcopters. However, application of MPC …
Reinforcement learning in robotics: A survey
Reinforcement learning offers to robotics a framework and set of tools for the design of
sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic …
sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic …
Guided policy search
Direct policy search can effectively scale to high-dimensional systems, but complex policies
with hundreds of parameters often present a challenge for such methods, requiring …
with hundreds of parameters often present a challenge for such methods, requiring …
The optimal sample complexity of PAC learning
S Hanneke - Journal of Machine Learning Research, 2016 - jmlr.org
Policy search methods can allow robots to learn control policies for a wide range of tasks,
but practical applications of policy search often require hand-engineered components for …
but practical applications of policy search often require hand-engineered components for …
Deep reinforcement learning for tensegrity robot locomotion
Tensegrity robots, composed of rigid rods connected by elastic cables, have a number of
unique properties that make them appealing for use as planetary exploration rovers …
unique properties that make them appealing for use as planetary exploration rovers …
Optimizing expectations: From deep reinforcement learning to stochastic computation graphs
J Schulman - 2016 - escholarship.org
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization
problem: maximize the expected total reward with respect to the parameters of the policy …
problem: maximize the expected total reward with respect to the parameters of the policy …