Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities

Y Yan, AHF Chow, CP Ho, YH Kuo, Q Wu… - … Research Part E …, 2022 - Elsevier
With advances in technologies, data science techniques, and computing equipment, there
has been rapidly increasing interest in the applications of reinforcement learning (RL) to …

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

W Yang, L Zhang, Z Zhang - The Annals of Statistics, 2022 - projecteuclid.org
Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …

Policy gradient in robust mdps with global convergence guarantee

Q Wang, CP Ho, M Petrik - International Conference on …, 2023 - proceedings.mlr.press
Abstract Robust Markov decision processes (RMDPs) provide a promising framework for
computing reliable policies in the face of model errors. Many successful reinforcement …

Partial policy iteration for l1-robust markov decision processes

CP Ho, M Petrik, W Wiesemann - Journal of Machine Learning Research, 2021 - jmlr.org
Robust Markov decision processes (MDPs) compute reliable solutions for dynamic decision
problems with partially-known transition probabilities. Unfortunately, accounting for …

Model-based offline reinforcement learning with pessimism-modulated dynamics belief

K Guo, S Yunfeng, Y Geng - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Model-based offline reinforcement learning (RL) aims to find highly rewarding
policy, by leveraging a previously collected static dataset and a dynamics model. While the …

Fast Algorithms for -constrained S-rectangular Robust MDPs

B Behzadian, M Petrik, CP Ho - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Robust Markov decision processes (RMDPs) are a useful building block of robust
reinforcement learning algorithms but can be hard to solve. This paper proposes a fast …

Value-distributional model-based reinforcement learning

CE Luis, AG Bottero, J Vinogradska… - Journal of Machine …, 2024 - jmlr.org
Quantifying uncertainty about a policy's long-term performance is important to solve
sequential decision-making tasks. We study the problem from a model-based Bayesian …

Solving multi-model MDPs by coordinate ascent and dynamic programming

X Su, M Petrik - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Multi-model Markov decision process (MMDP) is a promising framework for computing
policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that …

Robust satisficing mdps

H Ruan, S Zhou, Z Chen, CP Ho - … Conference on Machine …, 2023 - proceedings.mlr.press
Despite being a fundamental building block for reinforcement learning, Markov decision
processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are …

Percentile criterion optimization in offline reinforcement learning

C Cousins, E Lobo, M Petrik… - Advances in Neural …, 2023 - proceedings.neurips.cc
In reinforcement learning, robust policies for high-stakes decision-making problems with
limited data are usually computed by optimizing the percentile criterion. The percentile …