- Academic Search

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arxiv preprint arxiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Uložit Citovat Počet citací tohoto článku: 52 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer plays a vital role in the realms of natural language processing (NLP) and
computer vision (CV), specially for constructing large language models (LLM) and large …

Uložit Citovat Počet citací tohoto článku: 30 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Uložit Citovat Počet citací tohoto článku: 2229 Související články Všechny verze (počet: 11) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vmamba: Visual state space model

Y Liu, Y Tian, Y Zhao, H Yu, L **e… - Advances in neural …, 2025 - proceedings.neurips.cc

Designing computationally efficient network architectures remains an ongoing necessity in
computer vision. In this paper, we adapt Mamba, a state-space language model, into …

Uložit Citovat Počet citací tohoto článku: 1076 Související články Všechny verze (počet: 12) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Uložit Citovat Počet citací tohoto článku: 315 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Uložit Citovat Počet citací tohoto článku: 467 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press

Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

Uložit Citovat Počet citací tohoto článku: 264 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Uložit Citovat Počet citací tohoto článku: 299 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org

State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

Uložit Citovat Počet citací tohoto článku: 458 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Long-term forecasting with tide: Time-series dense encoder

A Das, W Kong, A Leach, S Mathur, R Sen… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent work has shown that simple linear models can outperform several Transformer
based approaches in long term time-series forecasting. Motivated by this, we propose a …

Uložit Citovat Počet citací tohoto článku: 311 Související články Všechny verze (počet: 5) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Diagonal state spaces are as effective as structured state spaces

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

A survey on transformer compression

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

Vmamba: Visual state space model

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

Rwkv: Reinventing rnns for the transformer era

Resurrecting recurrent neural networks for long sequences

Hyena hierarchy: Towards larger convolutional language models

Hungry hungry hippos: Towards language modeling with state space models

Long-term forecasting with tide: Time-series dense encoder