- Academic Search

Save Cite Cited by 139 Related articles All 7 versions Free GPT-4 View as HTML

Scatterbrain: Unifying sparse and low-rank attention

B Chen, T Dao, E Winsor, Z Song… - Advances in Neural …, 2021 - proceedings.neurips.cc

Recent advances in efficient Transformers have exploited either the sparsity or low-rank
properties of attention matrices to reduce the computational and memory bottlenecks of …

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arxiv preprint arxiv …, 2022 - arxiv.org

This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

Save Cite Cited by 81 Related articles All 4 versions Free GPT-4 View as HTML

Pixelated butterfly: Simple and efficient sparse training for neural network models

T Dao, B Chen, K Liang, J Yang, Z Song… - arxiv preprint arxiv …, 2021 - arxiv.org

Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …

Save Cite Cited by 82 Related articles All 6 versions Free GPT-4 View as HTML

Sparse spiking gradient descent

N Perez-Nieves, D Goodman - Advances in Neural …, 2021 - proceedings.neurips.cc

There is an increasing interest in emulating Spiking Neural Networks (SNNs) on
neuromorphic computing devices due to their low energy consumption. Recent advances …

Save Cite Cited by 95 Related articles All 6 versions Free GPT-4 View as HTML

Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing

J Alman, Z Song, R Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Over the last decade, deep neural networks have transformed our society, and they are
already widely applied in various machine learning applications. State-of-the-art deep …

Save Cite Cited by 34 Related articles All 5 versions Free GPT-4 View as HTML

Does preprocessing help training over-parameterized neural networks?

Z Song, S Yang, R Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Deep neural networks have achieved impressive performance in many areas. Designing a
fast and provable method for training neural networks is a fundamental question in machine …

Save Cite Cited by 57 Related articles All 7 versions Free GPT-4 View as HTML

A survey on large-scale machine learning

M Wang, W Fu, X He, S Hao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Machine learning can provide deep insights into data, allowing machines to make high-
quality predictions and having been widely used in real-world applications, such as text …

Save Cite Cited by 150 Related articles All 4 versions Free GPT-4

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arxiv preprint arxiv:2112.07628, 2021 - arxiv.org

We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

Save Cite Cited by 72 Related articles All 6 versions Free GPT-4 View as HTML