Efficient exploration in continuous-time model-based reinforcement learning
Reinforcement learning algorithms typically consider discrete-time dynamics, even though
the underlying systems are often continuous in time. In this paper, we introduce a model …
the underlying systems are often continuous in time. In this paper, we introduce a model …
Do Transformer World Models Give Better Policy Gradients?
A natural approach for reinforcement learning is to predict future rewards by unrolling a
neural network world model, and to backpropagate through the resulting computational …
neural network world model, and to backpropagate through the resulting computational …
A Pontryagin Perspective on Reinforcement Learning
Reinforcement learning has traditionally focused on learning state-dependent policies to
solve optimal control problems in a closed-loop fashion. In this work, we introduce the …
solve optimal control problems in a closed-loop fashion. In this work, we introduce the …
A Differentiable Sequence Model Perspective on Policy Gradients
Progress in sequence modeling with deep learning has been driven by the advances in
temporal credit assignment coming from better gradient propagation in neural network …
temporal credit assignment coming from better gradient propagation in neural network …