Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning

E Salvato, G Fenu, E Medvet, FA Pellegrino - IEEE Access, 2021 - ieeexplore.ieee.org
The growing demand for robots able to act autonomously in complex scenarios has widely
accelerated the introduction of Reinforcement Learning (RL) in robots control applications …

A tour of reinforcement learning: The view from continuous control

B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …

Global convergence of policy gradient methods for the linear quadratic regulator

M Fazel, R Ge, S Kakade… - … conference on machine …, 2018 - proceedings.mlr.press
Direct policy gradient methods for reinforcement learning and continuous control problems
are a popular approach for a variety of reasons: 1) they are easy to implement without …

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Projection-based model reduction: Formulations for physics-based machine learning

R Swischuk, L Mainini, B Peherstorfer, K Willcox - Computers & Fluids, 2019 - Elsevier
This paper considers the creation of parametric surrogate models for applications in science
and engineering where the goal is to predict high-dimensional output quantities of interest …

Simple random search provides a competitive approach to reinforcement learning

H Mania, A Guy, B Recht - arxiv preprint arxiv:1803.07055, 2018 - arxiv.org
A common belief in model-free reinforcement learning is that methods based on random
search in the parameter space of policies exhibit significantly worse sample complexity than …

Simple random search of static linear policies is competitive for reinforcement learning

H Mania, A Guy, B Recht - Advances in neural information …, 2018 - proceedings.neurips.cc
Abstract Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling
dynamical systems without requiring models of the system dynamics. We introduce a model …

Naive exploration is optimal for online lqr

M Simchowitz, D Foster - International Conference on …, 2020 - proceedings.mlr.press
We consider the problem of online adaptive control of the linear quadratic regulator, where
the true system parameters are unknown. We prove new upper and lower bounds …

Regret bounds for robust adaptive control of the linear quadratic regulator

S Dean, H Mania, N Matni… - Advances in Neural …, 2018 - proceedings.neurips.cc
We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …

Derivative-free methods for policy optimization: Guarantees for linear quadratic systems

D Malik, A Pananjady, K Bhatia, K Khamaru… - Journal of Machine …, 2020 - jmlr.org
We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …