Maximum entropy RL (provably) solves some robust RL problems

B Eysenbach, S Levine - arxiv preprint arxiv:2103.06257, 2021 - arxiv.org
Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …

Policy gradient bayesian robust optimization for imitation learning

Z Javed, DS Brown, S Sharma, J Zhu… - International …, 2021 - proceedings.mlr.press
The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …

Incorporating convex risk measures into multistage stochastic programming algorithms

O Dowson, DP Morton, BK Pagnoncelli - Annals of Operations Research, 2022 - Springer
Over the last two decades, coherent risk measures have been well studied as a principled,
axiomatic way to characterize the risk of a random variable. Because of this axiomatic …

Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning

P Parsa, RZ Moayedi, M Bornosi, MM Bejani - arxiv preprint arxiv …, 2023 - arxiv.org
The reinforcement learning algorithms that focus on how to compute the gradient and
choose next actions, are effectively improved the performance of the agents. However, these …

[IDÉZET][C] Robust Imitation Learning for Risk-Aware Behavior and Sim2Real Transfer

Z Javed - 2022