Improved and generalized upper bounds on the complexity of policy iteration
B Scherrer - Advances in Neural Information Processing …, 2013 - proceedings.neurips.cc
Abstract Given a Markov Decision Process (MDP) with $ n $ states and $ m $ actions per
state, we study the number of iterations needed by Policy Iteration (PI) algorithms to …
state, we study the number of iterations needed by Policy Iteration (PI) algorithms to …
Multi-gear bandits, partial conservation laws, and indexability
J Niño-Mora - Mathematics, 2022 - mdpi.com
This paper considers what we propose to call multi-gear bandits, which are Markov decision
processes modeling a generic dynamic and stochastic project fueled by a single resource …
processes modeling a generic dynamic and stochastic project fueled by a single resource …
The Smoothed Complexity of Policy Iteration for Markov Decision Processes
We show subexponential lower bounds (ie, 2Ω (nc)) on the smoothed complexity of the
classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds …
classical Howard's Policy Iteration algorithm for Markov Decision Processes. The bounds …
Geometric policy iteration for Markov decision processes
Recently discovered polyhedral structures of the value function for finite discounted Markov
decision processes (MDP) shed light on understanding the success of reinforcement …
decision processes (MDP) shed light on understanding the success of reinforcement …
Randomised procedures for initialising and switching actions in policy iteration
Abstract Policy Iteration (PI)(Howard 1960) is a classical method for computing an optimal
policy for a finite Markov Decision Problem (MDP). The method is conceptually simple …
policy for a finite Markov Decision Problem (MDP). The method is conceptually simple …
[HTML][HTML] A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations
Abstract Unique Sink Orientations (USOs) are an appealing abstraction of several major
optimization problems of applied mathematics such as Linear Programming (LP), Markov …
optimization problems of applied mathematics such as Linear Programming (LP), Markov …
Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs
R Goenka, E Gupta, S Khyalia, P Agarwal… - arxiv preprint arxiv …, 2022 - arxiv.org
Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for
Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on …
Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on …
A low-rank approximation for MDPs via moment coupling
ABZ Zhang, I Gurvich - Operations Research, 2024 - pubsonline.informs.org
We introduce a framework to approximate Markov decision processes (MDPs) that stands on
two pillars:(i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit …
two pillars:(i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit …
[BOOK][B] Exploiting Model Smoothness in Dynamic Decisions
ABZ Zhang - 2022 - search.proquest.com
Utilizing structure in mathematical modeling is instrumental for better model design, creation,
and solution. In this dissertation, we explore smoothness-based structure for problems …
and solution. In this dissertation, we explore smoothness-based structure for problems …
[PDF][PDF] Theoretical Analysis of Policy Iteration
S Kalyanakrishnan - 2017 - cse.iitb.ac.in
Theoretical Analysis of Policy Iteration Page 1 1/31 Theoretical Analysis of Policy Iteration
Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute …
Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute …