Statistical learning theory for control: A finite-sample perspective

A Tsiamis, I Ziemann, N Matni… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org
Learning algorithms have become an integral component to modern engineering solutions.
Examples range from self-driving cars and recommender systems to finance and even …

On the sample complexity of the linear quadratic regulator

S Dean, H Mania, N Matni, B Recht, S Tu - Foundations of Computational …, 2020 - Springer
This paper addresses the optimal control problem known as the linear quadratic regulator in
the case when the dynamics are unknown. We propose a multistage procedure, called …

Naive exploration is optimal for online lqr

M Simchowitz, D Foster - International Conference on …, 2020 - proceedings.mlr.press
We consider the problem of online adaptive control of the linear quadratic regulator, where
the true system parameters are unknown. We prove new upper and lower bounds …

Certainty equivalence is efficient for linear quadratic control

H Mania, S Tu, B Recht - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We study the performance of the certainty equivalent controller on Linear Quadratic (LQ)
control problems with unknown transition dynamics. We show that for both the fully and …

Reinforcement learning in continuous time and space: A stochastic control approach

H Wang, T Zariphopoulou, XY Zhou - Journal of Machine Learning …, 2020 - jmlr.org
We consider reinforcement learning (RL) in continuous time with continuous feature and
action spaces. We motivate and devise an exploratory formulation for the feature dynamics …

Derivative-free methods for policy optimization: Guarantees for linear quadratic systems

D Malik, A Pananjady, K Bhatia, K Khamaru… - Journal of Machine …, 2020 - jmlr.org
We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …

Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT $ Regret

A Cohen, T Koren, Y Mansour - International Conference on …, 2019 - proceedings.mlr.press
We present the first computationally-efficient algorithm with $\widetilde {O}(\sqrt {T}) $ regret
for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve …

The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint

S Tu, B Recht - Conference on learning theory, 2019 - proceedings.mlr.press
The effectiveness of model-based versus model-free methods is a long-standing question in
reinforcement learning (RL). Motivated by recent empirical success of RL on continuous …

Logarithmic regret bound in partially observable linear dynamical systems

S Lale, K Azizzadenesheli, B Hassibi… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study the problem of system identification and adaptive control in partially observable
linear dynamical systems. Adaptive and closed-loop system identification is a challenging …

Feel-good thompson sampling for contextual bandits and reinforcement learning

T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM
Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …