A practitioner's guide to MDP model checking algorithms

A Hartmanns, S Junges, T Quatmann… - … Conference on Tools …, 2023 - Springer
Abstract Model checking undiscounted reachability and expected-reward properties on
Markov decision processes (MDPs) is key for the verification of systems that act under …

Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor

TD Hansen, PB Miltersen, U Zwick - Journal of the ACM (JACM), 2013 - dl.acm.org
Ye [2011] showed recently that the simplex method with Dantzig's pivoting rule, as well as
Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs) …

Optimal convergence rate for exact policy mirror descent in discounted markov decision processes

E Johnson, C Pike-Burke… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide
range of novel and fundamental methods in reinforcement learning. Motivated by the …

The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate

Y Ye - Mathematics of Operations Research, 2011 - pubsonline.informs.org
We prove that the classic policy-iteration method [Howard, RA 1960. Dynamic Programming
and Markov Processes. MIT, Cambridge] and the original simplex method with the most …

Improved and generalized upper bounds on the complexity of policy iteration

B Scherrer - Advances in Neural Information Processing …, 2013 - proceedings.neurips.cc
Abstract Given a Markov Decision Process (MDP) with $ n $ states and $ m $ actions per
state, we study the number of iterations needed by Policy Iteration (PI) algorithms to …

Acting in delayed environments with non-stationary markov policies

E Derman, G Dalal, S Mannor - arxiv preprint arxiv:2101.11992, 2021 - arxiv.org
The standard Markov Decision Process (MDP) formulation hinges on the assumption that an
action is executed immediately after it was chosen. However, assuming it is often unrealistic …

Universal trees grow inside separating automata: Quasi-polynomial lower bounds for parity games

W Czerwiński, L Daviaud, N Fijalkow, M Jurdziński… - Proceedings of the …, 2019 - SIAM
Several distinct techniques have been proposed to design quasi-polynomial algorithms for
solving parity games since the breakthrough result of Calude, Jain, Khoussainov, Li, and …

Subexponential lower bounds for randomized pivoting rules for the simplex algorithm

O Friedmann, TD Hansen, U Zwick - … of the forty-third annual ACM …, 2011 - dl.acm.org
The simplex algorithm is among the most widely used algorithms for solving linear programs
in practice. With essentially all deterministic pivoting rules it is known, however, to require an …

Parity games: Zielonka's algorithm in quasi-polynomial time

P Parys - arxiv preprint arxiv:1904.12446, 2019 - arxiv.org
Calude, Jain, Khoussainov, Li, and Stephan (2017) proposed a quasi-polynomial-time
algorithm solving parity games. After this breakthrough result, a few other quasi-polynomial …

A recursive approach to solving parity games in quasipolynomial time

K Lehtinen, P Parys, S Schewe… - Logical Methods in …, 2022 - lmcs.episciences.org
Zielonka's classic recursive algorithm for solving parity games is perhaps the simplest
among the many existing parity game algorithms. However, its complexity is exponential …