Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces

B Kerimkulov, JM Leahy, D Siska, L Szpruch… - arxiv preprint arxiv …, 2023 - arxiv.org
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon
entropy-regularised Markov decision processes with Polish state and action space. The flow …

Geometry and convergence of natural policy gradient methods

J Müller, G Montúfar - Information Geometry, 2024 - Springer
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …

On the global convergence of fitted Q-iteration with two-layer neural network parametrization

M Gaur, V Aggarwal, M Agarwal - … Conference on Machine …, 2023 - proceedings.mlr.press
Deep Q-learning based algorithms have been applied successfully in many decision making
problems, while their theoretical foundations are not as well understood. In this paper, we …

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

P Malo, L Viitasaari, A Suominen, E Vilkkumaa… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper studies reinforcement learning (RL) in infinite-horizon dynamic decision
processes with almost-sure safety constraints. Such safety-constrained decision processes …

Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers

J Müller - 2023 - ul.qucosa.de
Abstract (EN) This thesis is divided into two parts dealing with the optimization problems in
Markov decision processes (MDPs) and different neural network-based numerical solvers …

[PDF][PDF] Geometry and convergence of natural policy gradient methods

G Montúfar, J Müller - 2022 - mis.mpg.de
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …

[LIVRE][B] The development of data-driven methods for modelling and optimisation of chemical process systems

M Mowbray - 2022 - search.proquest.com
In this thesis, data driven approaches to sequential decision making problems within
process systems engineering (PSE) are developed. Specifically, the use of model-free …