Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning
The growing demand for robots able to act autonomously in complex scenarios has widely
accelerated the introduction of Reinforcement Learning (RL) in robots control applications …
accelerated the introduction of Reinforcement Learning (RL) in robots control applications …
A tour of reinforcement learning: The view from continuous control
B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …
with a focus on continuous control applications. It reviews the general formulation …
Global convergence of policy gradient methods for the linear quadratic regulator
Direct policy gradient methods for reinforcement learning and continuous control problems
are a popular approach for a variety of reasons: 1) they are easy to implement without …
are a popular approach for a variety of reasons: 1) they are easy to implement without …
A finite time analysis of temporal difference learning with linear function approximation
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …
function corresponding to a given policy in a Markov decision process. Although TD is one of …
Projection-based model reduction: Formulations for physics-based machine learning
This paper considers the creation of parametric surrogate models for applications in science
and engineering where the goal is to predict high-dimensional output quantities of interest …
and engineering where the goal is to predict high-dimensional output quantities of interest …
Simple random search provides a competitive approach to reinforcement learning
A common belief in model-free reinforcement learning is that methods based on random
search in the parameter space of policies exhibit significantly worse sample complexity than …
search in the parameter space of policies exhibit significantly worse sample complexity than …
Simple random search of static linear policies is competitive for reinforcement learning
Abstract Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling
dynamical systems without requiring models of the system dynamics. We introduce a model …
dynamical systems without requiring models of the system dynamics. We introduce a model …
Naive exploration is optimal for online lqr
We consider the problem of online adaptive control of the linear quadratic regulator, where
the true system parameters are unknown. We prove new upper and lower bounds …
the true system parameters are unknown. We prove new upper and lower bounds …
Regret bounds for robust adaptive control of the linear quadratic regulator
We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …
Derivative-free methods for policy optimization: Guarantees for linear quadratic systems
We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …
focus on characterizing the convergence rate of these methods when applied to linear …