Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - arxiv.org
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org
State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …