Mamba: Linear-time sequence modeling with selective state spaces
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …
almost universally based on the Transformer architecture and its core attention module …
Efficiently modeling long sequences with structured state spaces
A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …
address sequence data across a range of modalities and tasks, particularly on long-range …
Resurrecting recurrent neural networks for long sequences
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
On the parameterization and initialization of diagonal state space models
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
Hungry hungry hippos: Towards language modeling with state space models
State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …
performance in some modalities, but underperform attention in language modeling …
Simplified state space layers for sequence modeling
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
S4nd: Modeling images and videos as multidimensional signals with state spaces
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …
Pointmamba: A simple state space model for point cloud analysis
Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …
tasks due to their excellent global modeling ability. However, the attention mechanism has …
A survey on vision mamba: Models, applications and challenges
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
Simple hardware-efficient long convolutions for sequence modeling
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …
sophisticated initialization techniques and specialized implementations for high quality and …