Policy gradient for rectangular robust markov decision processes

N Kumar, E Derman, M Geist… - Advances in Neural …, 2023 - proceedings.neurips.cc
Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …

Solving non-rectangular reward-robust MDPs via frequency regularization

U Gadot, E Derman, N Kumar, MM Elfatihi… - Proceedings of the …, 2024 - ojs.aaai.org
In robust Markov decision processes (RMDPs), it is assumed that the reward and the
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …

Ro** in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q **e - arxiv preprint arxiv:2406.08847, 2024 - arxiv.org
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

Entropy regularization for population estimation

B Chugg, P Henderson, J Goldin, DE Ho - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …

Robust reinforcement learning in continuous control tasks with uncertainty set regularization

Y Zhang, J Wang, J Boedecker - Conference on Robot …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) is recognized as lacking generalization and robustness under
environmental perturbations, which excessively restricts its application for real-world …

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arxiv preprint arxiv:2305.06851, 2023 - arxiv.org
Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

Robust reinforcement learning with distributional risk-averse formulation

P Clavier, S Allassonière, EL Pennec - arxiv preprint arxiv:2206.06841, 2022 - arxiv.org
Robust Reinforcement Learning tries to make predictions more robust to changes in the
dynamics or rewards of the system. This problem is particularly important when the …

Twice regularized Markov decision processes: The equivalence between robustness and regularization

E Derman, Y Men, M Geist, S Mannor - arxiv preprint arxiv:2303.06654, 2023 - arxiv.org
Robust Markov decision processes (MDPs) aim to handle changing or partially known
system dynamics. To solve them, one typically resorts to robust optimization methods …

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2412.06655, 2024 - arxiv.org
We introduce a new maximum entropy reinforcement learning framework based on the
distribution of states and actions visited by a policy. More precisely, an intrinsic reward …

Behind the Myth of Exploration in Policy Gradients

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2402.00162, 2024 - arxiv.org
Policy-gradient algorithms are effective reinforcement learning methods for solving control
problems with continuous state and action spaces. To compute near-optimal policies, it is …