Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Hippo: Recurrent memory with optimal polynomial projections

A Gu, T Dao, S Ermon, A Rudra… - Advances in neural …, 2020 - proceedings.neurips.cc
A central problem in learning from sequential data is representing cumulative history in an
incremental fashion as more data is processed. We introduce a general framework (HiPPO) …

Randomized numerical linear algebra: Foundations and algorithms

PG Martinsson, JA Tropp - Acta Numerica, 2020 - cambridge.org
This survey describes probabilistic algorithms for linear algebraic computations, such as
factorizing matrices and solving linear systems. It focuses on techniques that have a proven …

Simple hardware-efficient long convolutions for sequence modeling

DY Fu, EL Epstein, E Nguyen… - International …, 2023 - proceedings.mlr.press
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …

Monarch: Expressive structured matrices for efficient and accurate training

T Dao, B Chen, NS Sohoni, A Desai… - International …, 2022 - proceedings.mlr.press
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense …

Scatterbrain: Unifying sparse and low-rank attention

B Chen, T Dao, E Winsor, Z Song… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recent advances in efficient Transformers have exploited either the sparsity or low-rank
properties of attention matrices to reduce the computational and memory bottlenecks of …

Fast sparse convnets

E Elsen, M Dukhan, T Gale… - Proceedings of the …, 2020 - openaccess.thecvf.com
Historically, the pursuit of efficient inference has been one of the driving forces behind the
research into new deep learning architectures and building blocks. Some of the recent …