Improved and generalized upper bounds on the complexity of policy iteration

B Scherrer - Advances in Neural Information Processing …, 2013 - proceedings.neurips.cc
Abstract Given a Markov Decision Process (MDP) with $ n $ states and $ m $ actions per
state, we study the number of iterations needed by Policy Iteration (PI) algorithms to …

Multi-gear bandits, partial conservation laws, and indexability

J Niño-Mora - Mathematics, 2022 - mdpi.com
This paper considers what we propose to call multi-gear bandits, which are Markov decision
processes modeling a generic dynamic and stochastic project fueled by a single resource …

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

M Christ, M Yannakakis - Proceedings of the 55th Annual ACM …, 2023 - dl.acm.org
We show subexponential lower bounds (ie, 2Ω (nc)) on the smoothed complexity of the
classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds …

Geometric policy iteration for Markov decision processes

Y Wu, JA De Loera - Proceedings of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
Recently discovered polyhedral structures of the value function for finite discounted Markov
decision processes (MDP) shed light on understanding the success of reinforcement …

Randomised procedures for initialising and switching actions in policy iteration

S Kalyanakrishnan, N Misra, A Gopalan - Proceedings of the AAAI …, 2016 - ojs.aaai.org
Abstract Policy Iteration (PI)(Howard 1960) is a classical method for computing an optimal
policy for a finite Markov Decision Problem (MDP). The method is conceptually simple …

[HTML][HTML] A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations

B Gerencsér, R Hollanders, JC Delvenne… - Journal of Discrete …, 2017 - Elsevier
Abstract Unique Sink Orientations (USOs) are an appealing abstraction of several major
optimization problems of applied mathematics such as Linear Programming (LP), Markov …

Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

R Goenka, E Gupta, S Khyalia, P Agarwal… - arxiv preprint arxiv …, 2022 - arxiv.org
Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for
Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on …

A low-rank approximation for MDPs via moment coupling

ABZ Zhang, I Gurvich - Operations Research, 2024 - pubsonline.informs.org
We introduce a framework to approximate Markov decision processes (MDPs) that stands on
two pillars:(i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit …

[BOOK][B] Exploiting Model Smoothness in Dynamic Decisions

ABZ Zhang - 2022 - search.proquest.com
Utilizing structure in mathematical modeling is instrumental for better model design, creation,
and solution. In this dissertation, we explore smoothness-based structure for problems …

[PDF][PDF] Theoretical Analysis of Policy Iteration

S Kalyanakrishnan - 2017 - cse.iitb.ac.in
Theoretical Analysis of Policy Iteration Page 1 1/31 Theoretical Analysis of Policy Iteration
Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute …