Mamba: Linear-time sequence modeling with selective state spaces
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …
almost universally based on the Transformer architecture and its core attention module …
Vision mamba: Efficient visual representation learning with bidirectional state space model
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …
Mamba deep learning model, have shown great potential for long sequence modeling …
Rwkv: Reinventing rnns for the transformer era
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …
suffer from memory and computational complexity that scales quadratically with sequence …
Hyena hierarchy: Towards larger convolutional language models
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …
to their ability to learn at scale. However, the core building block of Transformers, the …
Resurrecting recurrent neural networks for long sequences
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
Sequence modeling and design from molecular to genome scale with Evo
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …
organism's function. We present Evo, a long-context genomic foundation model with a …
Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Hungry hungry hippos: Towards language modeling with state space models
State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …
performance in some modalities, but underperform attention in language modeling …
On the parameterization and initialization of diagonal state space models
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
S4nd: Modeling images and videos as multidimensional signals with state spaces
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …