Vmamba: Visual state space model
Designing computationally efficient network architectures remains an ongoing necessity in
computer vision. In this paper, we adapt Mamba, a state-space language model, into …
computer vision. In this paper, we adapt Mamba, a state-space language model, into …
Mamba: Linear-time sequence modeling with selective state spaces
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …
almost universally based on the Transformer architecture and its core attention module …
Resurrecting recurrent neural networks for long sequences
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …
Simplified state space layers for sequence modeling
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
Hierarchically gated recurrent neural network for sequence modeling
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …
training and long-term dependency modeling. Recently, there has been a renewed interest …
A survey on vision mamba: Models, applications and challenges
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
Monarch mixer: A simple sub-quadratic gemm-based architecture
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …
and model dimension to reach longer contexts and better performance. However, existing …
Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Simple hardware-efficient long convolutions for sequence modeling
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …
sophisticated initialization techniques and specialized implementations for high quality and …
Gated linear attention transformers with hardware-efficient training
Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear (with …
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear (with …