Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …
the development of their theoretical foundations. Despite the huge efforts directed at the …
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon
entropy-regularised Markov decision processes with Polish state and action space. The flow …
entropy-regularised Markov decision processes with Polish state and action space. The flow …
Geometry and convergence of natural policy gradient methods
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …
horizon discounted Markov decision processes with regular policy parametrizations. For a …
On the global convergence of fitted Q-iteration with two-layer neural network parametrization
Deep Q-learning based algorithms have been applied successfully in many decision making
problems, while their theoretical foundations are not as well understood. In this paper, we …
problems, while their theoretical foundations are not as well understood. In this paper, we …
Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints
This paper studies reinforcement learning (RL) in infinite-horizon dynamic decision
processes with almost-sure safety constraints. Such safety-constrained decision processes …
processes with almost-sure safety constraints. Such safety-constrained decision processes …
Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers
J Müller - 2023 - ul.qucosa.de
Abstract (EN) This thesis is divided into two parts dealing with the optimization problems in
Markov decision processes (MDPs) and different neural network-based numerical solvers …
Markov decision processes (MDPs) and different neural network-based numerical solvers …
[PDF][PDF] Geometry and convergence of natural policy gradient methods
G Montúfar, J Müller - 2022 - mis.mpg.de
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …
horizon discounted Markov decision processes with regular policy parametrizations. For a …
[LIVRE][B] The development of data-driven methods for modelling and optimisation of chemical process systems
M Mowbray - 2022 - search.proquest.com
In this thesis, data driven approaches to sequential decision making problems within
process systems engineering (PSE) are developed. Specifically, the use of model-free …
process systems engineering (PSE) are developed. Specifically, the use of model-free …