Model-free policy learning with reward gradients

Q Lan, S Tosatto, H Farrahi, AR Mahmood - arxiv preprint arxiv …, 2021‏ - arxiv.org
Despite the increasing popularity of policy gradient methods, they are yet to be widely
utilized in sample-scarce applications, such as robotics. The sample efficiency could be …

Diminishing return of value expansion methods in model-based reinforcement learning

D Palenicek, M Lutter, J Carvalho, J Peters - arxiv preprint arxiv …, 2023‏ - arxiv.org
Model-based reinforcement learning is one approach to increase sample efficiency.
However, the accuracy of the dynamics model and the resulting compounding error over …

Diminishing Return of Value Expansion Methods

D Palenicek, M Lutter, J Carvalho, D Dennert… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Model-based reinforcement learning aims to increase sample efficiency, but the accuracy of
dynamics models and the resulting compounding errors are often seen as key limitations …

A gradient critic for policy gradient estimation

S Tosatto, A Patterson, M White… - … European Workshop on …, 2023‏ - openreview.net
The policy gradient theorem (Sutton et al., 2000) prescribes the usage of the on-policy state
distribution to approximate the gradient. Most algorithms based on this theorem, in practice …

Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework

K Van Den Houten, E Van Krieken… - 2022 Winter …, 2022‏ - ieeexplore.ieee.org
Policy gradient methods are successful for a wide range of reinforcement learning tasks.
Traditionally, such methods utilize the score function as stochastic gradient estimator. We …

[PDF][PDF] Randomizing physics simulations for robot learning

F Muratore - 2021‏ - d-nb.info
The ability to mentally evaluate variations of the future may well be the key to intelligence.
Combined with the ability to reason, it makes humans excellent at handling new and …

[PDF][PDF] Trust region optimization of optimistic actor critic

N Kappes, P Herrmann - 2022‏ - ias.informatik.tu-darmstadt.de
The exploration-exploitation trade-off is a fundamental challenge in reinforcement learning.
While off-policy algorithms like Soft Actor-Critic (SAC) yield good performance, they can …