Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem

H Mohammadi, A Zare, M Soltanolkotabi… - … on Automatic Control, 2021 - ieeexplore.ieee.org
Model-free reinforcement learning attempts to find an optimal control action for an unknown
dynamical system by directly searching over the parameter space of controllers. The …

DeeP-LCC: Data-enabled predictive leading cruise control in mixed traffic flow

J Wang, Y Zheng, K Li, Q Xu - IEEE Transactions on Control …, 2023 - ieeexplore.ieee.org
For the control of connected and autonomous vehicles (CAVs), most existing methods focus
on model-based strategies. They require explicit knowledge of car-following dynamics of …

Optimizing static linear feedback: Gradient method

I Fatkhullin, B Polyak - SIAM Journal on Control and Optimization, 2021 - SIAM
The linear quadratic regulator is the fundamental problem of optimal control. Its state
feedback version was set and solved in the early 1960s. However, the static output feedback …

Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems

Y Zheng, L Furieri, M Kamgarpour… - Learning for dynamics …, 2021 - proceedings.mlr.press
This paper studies a class of partially observed Linear Quadratic Gaussian (LQG) problems
with unknown dynamics. We establish an end-to-end sample complexity bound on learning …

On the optimization landscape of dynamic output feedback linear quadratic control

J Duan, W Cao, Y Zheng, L Zhao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The convergence of policy gradient algorithms hinges on the optimization landscape of the
underlying optimal control problem. Theoretical insights into these algorithms can often be …

Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity

K Zhang, X Zhang, B Hu… - Advances in neural …, 2021 - proceedings.neurips.cc
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL),
and its applications in continuous control tasks have recently attracted increasing attention …

On the stability and convergence of robust adversarial reinforcement learning: A case study on linear quadratic systems

K Zhang, B Hu, T Basar - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the
simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) …

Learning the Kalman filter with fine-grained sample complexity

X Zhang, B Hu, T Başar - 2023 American Control Conference …, 2023 - ieeexplore.ieee.org
We develop the first end-to-end sample complexity of model-free policy gradient (PG)
methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the …

Global convergence of policy gradient primal–dual methods for risk-constrained LQRs

F Zhao, K You, T Başar - IEEE Transactions on Automatic …, 2023 - ieeexplore.ieee.org
While the techniques in optimal control theory are often model-based, the policy optimization
(PO) approach directly optimizes the performance metric of interest. Even though it has been …