- Academic Search

N Kumar, E Derman, M Geist… - Advances in Neural …, 2023 - proceedings.neurips.cc

Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …

Speichern Zitieren Zitiert von: 32 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] aaai.org

Solving non-rectangular reward-robust MDPs via frequency regularization

U Gadot, E Derman, N Kumar, MM Elfatihi… - Proceedings of the …, 2024 - ojs.aaai.org

In robust Markov decision processes (RMDPs), it is assumed that the reward and the
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Ro** in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q **e - arxiv preprint arxiv:2406.08847, 2024 - arxiv.org

We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] aaai.org

Entropy regularization for population estimation

B Chugg, P Henderson, J Goldin, DE Ho - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Robust reinforcement learning in continuous control tasks with uncertainty set regularization

Y Zhang, J Wang, J Boedecker - Conference on Robot …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) is recognized as lacking generalization and robustness under
environmental perturbations, which excessively restricts its application for real-world …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arxiv preprint arxiv:2305.06851, 2023 - arxiv.org

Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Robust reinforcement learning with distributional risk-averse formulation

P Clavier, S Allassonière, EL Pennec - arxiv preprint arxiv:2206.06841, 2022 - arxiv.org

Robust Reinforcement Learning tries to make predictions more robust to changes in the
dynamics or rewards of the system. This problem is particularly important when the …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Twice regularized Markov decision processes: The equivalence between robustness and regularization

E Derman, Y Men, M Geist, S Mannor - arxiv preprint arxiv:2303.06654, 2023 - arxiv.org

Robust Markov decision processes (MDPs) aim to handle changing or partially known
system dynamics. To solve them, one typically resorts to robust optimization methods …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2412.06655, 2024 - arxiv.org

We introduce a new maximum entropy reinforcement learning framework based on the
distribution of states and actions visited by a policy. More precisely, an intrinsic reward …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Behind the Myth of Exploration in Policy Gradients

A Bolland, G Lambrechts, D Ernst - arxiv preprint arxiv:2402.00162, 2024 - arxiv.org

Policy-gradient algorithms are effective reinforcement learning methods for solving control
problems with continuous state and action spaces. To compute near-optimal policies, it is …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 5 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Your policy regularizer is secretly an adversary

Policy gradient for rectangular robust markov decision processes

Solving non-rectangular reward-robust MDPs via frequency regularization

Ro** in Uncertainty: Robustness and Regularization in Markov Games

Entropy regularization for population estimation

Robust reinforcement learning in continuous control tasks with uncertainty set regularization

Policy gradient algorithms implicitly optimize by continuation

Robust reinforcement learning with distributional risk-averse formulation

Twice regularized Markov decision processes: The equivalence between robustness and regularization

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Behind the Myth of Exploration in Policy Gradients