Statistical learning theory for control: A finite-sample perspective
Learning algorithms have become an integral component to modern engineering solutions.
Examples range from self-driving cars and recommender systems to finance and even …
Examples range from self-driving cars and recommender systems to finance and even …
On the sample complexity of the linear quadratic regulator
This paper addresses the optimal control problem known as the linear quadratic regulator in
the case when the dynamics are unknown. We propose a multistage procedure, called …
the case when the dynamics are unknown. We propose a multistage procedure, called …
Naive exploration is optimal for online lqr
M Simchowitz, D Foster - International Conference on …, 2020 - proceedings.mlr.press
We consider the problem of online adaptive control of the linear quadratic regulator, where
the true system parameters are unknown. We prove new upper and lower bounds …
the true system parameters are unknown. We prove new upper and lower bounds …
Certainty equivalence is efficient for linear quadratic control
We study the performance of the certainty equivalent controller on Linear Quadratic (LQ)
control problems with unknown transition dynamics. We show that for both the fully and …
control problems with unknown transition dynamics. We show that for both the fully and …
Reinforcement learning in continuous time and space: A stochastic control approach
We consider reinforcement learning (RL) in continuous time with continuous feature and
action spaces. We motivate and devise an exploratory formulation for the feature dynamics …
action spaces. We motivate and devise an exploratory formulation for the feature dynamics …
Derivative-free methods for policy optimization: Guarantees for linear quadratic systems
We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …
focus on characterizing the convergence rate of these methods when applied to linear …
Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT $ Regret
We present the first computationally-efficient algorithm with $\widetilde {O}(\sqrt {T}) $ regret
for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve …
for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve …
The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint
The effectiveness of model-based versus model-free methods is a long-standing question in
reinforcement learning (RL). Motivated by recent empirical success of RL on continuous …
reinforcement learning (RL). Motivated by recent empirical success of RL on continuous …
Logarithmic regret bound in partially observable linear dynamical systems
We study the problem of system identification and adaptive control in partially observable
linear dynamical systems. Adaptive and closed-loop system identification is a challenging …
linear dynamical systems. Adaptive and closed-loop system identification is a challenging …
Feel-good thompson sampling for contextual bandits and reinforcement learning
T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM
Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …
flexibility of its modeling power. However, a general theory for this class of methods in the …