Efficient deep learning: A survey on making deep learning models smaller, faster, and better

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Combining recurrent, convolutional, and continuous-time models with linear state space layers

A Gu, I Johnson, K Goel, K Saab… - Advances in neural …, 2021 - proceedings.neurips.cc
Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations
(NDEs) are popular families of deep learning models for time-series data, each with unique …

Deep equilibrium models

S Bai, JZ Kolter, V Koltun - Advances in neural information …, 2019 - proceedings.neurips.cc
We present a new approach to modeling sequential data: the deep equilibrium model
(DEQ). Motivated by an observation that the hidden layers of many existing deep sequence …

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

The neural architecture of language: Integrative modeling converges on predictive processing

M Schrimpf, IA Blank, G Tuckute, C Kauf… - Proceedings of the …, 2021 - pnas.org
The neuroscience of perception has recently been revolutionized with an integrative
modeling approach in which computation, brain function, and behavior are linked across …