Loss of plasticity in deep continual learning

S Dohare, JF Hernandez-Garcia, Q Lan, P Rahman… - Nature, 2024 - nature.com
Artificial neural networks, deep-learning methods and the backpropagation algorithm form
the foundation of modern machine learning and artificial intelligence. These methods are …

Simplifying deep temporal difference learning

M Gallici, M Fellows, B Ellis, B Pou, I Masmitja… - ar** for deep continual and reinforcement learning
M Elsayed, Q Lan, C Lyle, AR Mahmood - arxiv preprint arxiv:2407.01704, 2024 - arxiv.org
Many failures in deep continual and reinforcement learning are associated with increasing
magnitudes of the weights, making them hard to change and potentially causing overfitting …

Normalization and effective learning rates in reinforcement learning

C Lyle, Z Zheng, K Khetarpal, J Martens… - arxiv preprint arxiv …, 2024 - arxiv.org
Normalization layers have recently experienced a renaissance in the deep reinforcement
learning and continual learning literature, with several works highlighting diverse benefits …

Learning continually by spectral regularization

A Lewandowski, M Bortkiewicz, S Kumar… - arxiv preprint arxiv …, 2024 - arxiv.org
Loss of plasticity is a phenomenon where neural networks can become more difficult to train
over the course of learning. Continual learning algorithms seek to mitigate this effect by …

[PDF][PDF] In value-based deep reinforcement learning, a pruned network is a good network

J Obando-Ceron, A Courville, PS Castro - Architecture, 2024 - raw.githubusercontent.com
Recent work has shown that deep reinforcement learning agents have difficulty in effectively
using their network parameters. We leverage prior insights into the advantages of sparse …

No representation, no trust: connecting representation, collapse, and trust issues in ppo

S Moalla, A Miele, D Pyatko, R Pascanu… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement learning (RL) is inherently rife with non-stationarity since the states and
rewards the agent observes during training depend on its changing policy. Therefore …