Policy gradient for rectangular robust markov decision processes

N Kumar, E Derman, M Geist… - Advances in Neural …, 2023 - proceedings.neurips.cc
Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …

Soft robust MDPs and risk-sensitive MDPs: Equivalence, policy gradient, and sample complexity

R Zhang, Y Hu, N Li - ar** in uncertainty: Robustness and regularization in markov games
J McMahan, G Artiglio, Q **e - arxiv preprint arxiv:2406.08847, 2024 - arxiv.org
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

Bridging distributionally robust learning and offline rl: An approach to mitigate distribution shift and partial data coverage

K Panaganti, Z Xu, D Kalathil… - arxiv preprint arxiv …, 2023 - arxiv.org
The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using
historical (offline) data, without access to the environment for online exploration. One of the …

Bring your own (non-robust) algorithm to solve robust MDPs by estimating the worst kernel

U Gadot, K Wang, N Kumar, KY Levy… - Forty-first International …, 2024 - openreview.net
Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-
making that is robust to perturbations on the transition kernel. However, current RMDP …

Imprecise probabilities meet partial observability: Game semantics for robust POMDPs

EM Bovy, M Suilen, S Junges, N Jansen - arxiv preprint arxiv:2405.04941, 2024 - arxiv.org
Partially observable Markov decision processes (POMDPs) rely on the key assumption that
probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this …

Robust markov decision processes: A place where AI and formal methods meet

M Suilen, T Badings, EM Bovy, D Parker… - Principles of Verification …, 2024 - Springer
Markov decision processes (MDPs) are a standard model for sequential decision-making
problems and are widely used across many scientific areas, including formal methods and …