Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Constrained-cost adaptive dynamic programming for optimal control of discrete-time nonlinear systems

Q Wei, T Li - IEEE Transactions on Neural Networks and …, 2023 - ieeexplore.ieee.org
For discrete-time nonlinear systems, this research is concerned with optimal control
problems (OCPs) with constrained cost, and a novel value iteration with constrained cost …

On the optimization landscape of dynamic output feedback linear quadratic control

J Duan, W Cao, Y Zheng, L Zhao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The convergence of policy gradient algorithms hinges on the optimization landscape of the
underlying optimal control problem. Theoretical insights into these algorithms can often be …

Global Convergence of Direct Policy Search for State-Feedback Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

X Guo, B Hu - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Direct policy search has been widely applied in modern reinforcement learning and
continuous control. However, the theoretical properties of direct policy search on nonsmooth …

Complexity of Derivative-Free Policy Optimization for Structured Control

X Guo, D Keivan, G Dullerud… - Advances in Neural …, 2023 - proceedings.neurips.cc
The applications of direct policy search in reinforcement learning and continuous control
have received increasing attention. In this work, we present novel theoretical results on the …

Provably efficient generalized lagrangian policy optimization for safe multi-agent reinforcement learning

D Ding, X Wei, Z Yang, Z Wang… - Learning for dynamics …, 2023 - proceedings.mlr.press
We examine online safe multi-agent reinforcement learning using constrained Markov
games in which agents compete by maximizing their expected total rewards under a …

Infinite-horizon risk-constrained linear quadratic regulator with average cost

F Zhao, K You, T Başar - 2021 60th IEEE Conference on …, 2021 - ieeexplore.ieee.org
The behaviour of a stochastic dynamical system may be largely influenced by those low-
probability, yet extreme events. To address such occurrences, this paper proposes an …

Reinforcement learning for linear exponential quadratic Gaussian problem

J Lai, J **ong - Systems & Control Letters, 2024 - Elsevier
This paper addresses the infinite-horizon linear exponential quadratic Gaussian problem for
a class of stochastic systems with additive noise. A model-free generalized policy iteration …

Controlgym: Large-scale control environments for benchmarking reinforcement learning algorithms

X Zhang, W Mao, S Mowlavi… - 6th Annual Learning …, 2024 - proceedings.mlr.press
We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-
dimensional partial differential equation (PDE)-based control problems. Integrated within the …

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

S Rozada, D Ding, AG Marques, A Ribeiro - arxiv preprint arxiv …, 2024 - arxiv.org
We study the problem of computing deterministic optimal policies for constrained Markov
decision processes (MDPs) with continuous state and action spaces, which are widely …