Policy gradient for rectangular robust markov decision processes
Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …
in a scalable and efficient manner. However, they do not account for transition uncertainty …
Solving non-rectangular reward-robust MDPs via frequency regularization
In robust Markov decision processes (RMDPs), it is assumed that the reward and the
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …
Ro** in Uncertainty: Robustness and Regularization in Markov Games
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …
Entropy regularization for population estimation
Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …
problems. We show that this same mechanism can also lead to nearly unbiased and lower …
Robust reinforcement learning in continuous control tasks with uncertainty set regularization
Reinforcement learning (RL) is recognized as lacking generalization and robustness under
environmental perturbations, which excessively restricts its application for real-world …
environmental perturbations, which excessively restricts its application for real-world …
Policy gradient algorithms implicitly optimize by continuation
Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …
Robust reinforcement learning with distributional risk-averse formulation
Robust Reinforcement Learning tries to make predictions more robust to changes in the
dynamics or rewards of the system. This problem is particularly important when the …
dynamics or rewards of the system. This problem is particularly important when the …
Twice regularized Markov decision processes: The equivalence between robustness and regularization
Robust Markov decision processes (MDPs) aim to handle changing or partially known
system dynamics. To solve them, one typically resorts to robust optimization methods …
system dynamics. To solve them, one typically resorts to robust optimization methods …
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
We introduce a new maximum entropy reinforcement learning framework based on the
distribution of states and actions visited by a policy. More precisely, an intrinsic reward …
distribution of states and actions visited by a policy. More precisely, an intrinsic reward …
Behind the Myth of Exploration in Policy Gradients
Policy-gradient algorithms are effective reinforcement learning methods for solving control
problems with continuous state and action spaces. To compute near-optimal policies, it is …
problems with continuous state and action spaces. To compute near-optimal policies, it is …