Revisiting LQR control from the perspective of receding-horizon policy gradient

X Zhang, T Başar - IEEE Control Systems Letters, 2023 - ieeexplore.ieee.org
We revisit in this letter the discrete-time linear quadratic regulator (LQR) problem from the
perspective of receding-horizon policy gradient (RHPG), a newly developed model-free …

Controlgym: Large-scale safety-critical control environments for benchmarking reinforcement learning algorithms

X Zhang, W Mao, S Mowlavi, M Benosman… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce controlgym, a library of thirty-six safety-critical industrial control settings, and
ten infinite-dimensional partial differential equation (PDE)-based control problems …

Controlgym: Large-scale control environments for benchmarking reinforcement learning algorithms

X Zhang, W Mao, S Mowlavi… - 6th Annual Learning …, 2024 - proceedings.mlr.press
We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-
dimensional partial differential equation (PDE)-based control problems. Integrated within the …

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

S Klein, S Weissmann, L Döring - arxiv preprint arxiv:2310.02671, 2023 - arxiv.org
Markov Decision Processes (MDPs) are a formal framework for modeling and solving
sequential decision-making problems. In finite-time horizons such problems are relevant for …

Structure Matters: Dynamic Policy Gradient

S Klein, X Zhang, T Başar, S Weissmann… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we study $\gamma $-discounted infinite-horizon tabular Markov decision
processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG). The …

Decision Transformer as a Foundation Model for Partially Observable Continuous Control

X Zhang, W Mao, H Qiu, T Başar - arxiv preprint arxiv:2404.02407, 2024 - arxiv.org
Closed-loop control of nonlinear dynamical systems with partial-state observability demands
expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it …

Policy Optimization for PDE Control with a Warm Start

X Zhang, S Mowlavi, M Benosman, T Başar - arxiv preprint arxiv …, 2024 - arxiv.org
Dimensionality reduction is crucial for controlling nonlinear partial differential equations
(PDE) through a" reduce-then-design" strategy, which identifies a reduced-order model and …

Dynamic approaches for stochastic gradient methods in reinforcement learning

S Klein - 2024 - madoc.bib.uni-mannheim.de
This work addresses the convergence behaviour of first-order optimization methods in the
context of reinforcement learning. Specifically, we analyse the vanilla Policy Gradient (PG) …