Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - arxiv.org
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arxiv preprint arxiv:2111.00396, 2021 - arxiv.org
A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org
State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

Pointmamba: A simple state space model for point cloud analysis

D Liang, X Zhou, W Xu, X Zhu, Z Zou, X Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …

A survey on vision mamba: Models, applications and challenges

R Xu, S Yang, Y Wang, B Du, H Chen - arxiv preprint arxiv:2404.18861, 2024 - arxiv.org
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

Simple hardware-efficient long convolutions for sequence modeling

DY Fu, EL Epstein, E Nguyen… - International …, 2023 - proceedings.mlr.press
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …