In value-based deep reinforcement learning, a pruned network is a good network

J Obando-Ceron, A Courville, PS Castro - arxiv preprint arxiv:2402.12479, 2024 - arxiv.org
Recent work has shown that deep reinforcement learning agents have difficulty in effectively
using their network parameters. We leverage prior insights into the advantages of sparse …

Scaling laws for sparsely-connected foundation models

E Frantar, C Riquelme, N Houlsby, D Alistarh… - arxiv preprint arxiv …, 2023 - arxiv.org
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained
on massive datasets (ie," foundation models"), in both vision and language domains. In this …

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

N Nasibullah, E Schultheis, M Lasby… - Advances in …, 2025 - proceedings.neurips.cc
Abstract In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to
post-training pruning for generating efficient models. In principle, DST allows for a much …

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

N Ullah, E Schultheis, M Lasby, Y Ioannou… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-
training pruning for generating efficient models. In principle, DST allows for a more memory …

Compiler Support for Sparse Tensor Convolutions

P Liu, AJ Root, A Xu, Y Li, F Kjolstad… - Proceedings of the ACM on …, 2024 - dl.acm.org
This paper extends prior work on sparse tensor algebra compilers to generate
asymptotically efficient code for tensor expressions with affine subscript expressions. Our …

ELSA: Partial Weight Freezing for Overhead-Free Sparse Network Deployment

P Halvachi, A Peste, D Alistarh, CH Lampert - arxiv preprint arxiv …, 2023 - arxiv.org
We present ELSA, a practical solution for creating deep networks that can easily be
deployed at different levels of sparsity. The core idea is to embed one or more sparse …