Efficient exploration in continuous-time model-based reinforcement learning

L Treven, J Hübotter, F Dorfler… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning algorithms typically consider discrete-time dynamics, even though
the underlying systems are often continuous in time. In this paper, we introduce a model …

Do Transformer World Models Give Better Policy Gradients?

M Ma, T Ni, C Gehring, P D'Oro, PL Bacon - arxiv preprint arxiv …, 2024 - arxiv.org
A natural approach for reinforcement learning is to predict future rewards by unrolling a
neural network world model, and to backpropagate through the resulting computational …

A Pontryagin Perspective on Reinforcement Learning

O Eberhard, C Vernade, M Muehlebach - arxiv preprint arxiv:2405.18100, 2024 - arxiv.org
Reinforcement learning has traditionally focused on learning state-dependent policies to
solve optimal control problems in a closed-loop fashion. In this work, we introduce the …

A Differentiable Sequence Model Perspective on Policy Gradients

M Ma, P D'Oro, T Ni, C Gehring, PL Bacon - openreview.net
Progress in sequence modeling with deep learning has been driven by the advances in
temporal credit assignment coming from better gradient propagation in neural network …