- Academic Search

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arxiv preprint arxiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Tallenna Viittaa Viittausten määrä 52 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

From large language models to large multimodal models: A literature review

D Huang, C Yan, Q Li, X Peng - Applied Sciences, 2024 - mdpi.com

With the deepening of research on Large Language Models (LLMs), significant progress has
been made in recent years on the development of Large Multimodal Models (LMMs), which …

Tallenna Viittaa Viittausten määrä 20 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota Välimuistissa

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Tallenna Viittaa Viittausten määrä 325 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

xlstm: Extended long short-term memory

M Beck, K Pöppel, M Spanring, A Auer… - arxiv preprint arxiv …, 2024 - arxiv.org

In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

Tallenna Viittaa Viittausten määrä 108 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to (learn at test time): Rnns with expressive hidden states

Y Sun, X Li, K Dalal, J Xu, A Vikram, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-attention performs well in long context but has quadratic complexity. Existing RNN
layers have linear complexity, but their performance in long context is limited by the …

Tallenna Viittaa Viittausten määrä 53 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An empirical study of mamba-based language models

R Waleffe, W Byeon, D Riach, B Norick… - arxiv preprint arxiv …, 2024 - arxiv.org

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of
Transformers, such as quadratic computational complexity with sequence length and large …

Tallenna Viittaa Viittausten määrä 57 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The mamba in the llama: Distilling and accelerating hybrid models

J Wang, D Paliotta, A May, A Rush… - Advances in Neural …, 2025 - proceedings.neurips.cc

Linear RNN architectures, like Mamba, can be competitive with Transformer models in
language modeling while having advantageous deployment characteristics. Given the focus …

Tallenna Viittaa Viittausten määrä 14 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Recurrent neural networks: vanishing and exploding gradients are not the end of the story

N Zucchet, A Orvieto - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories,
primarily due to vanishing and exploding gradients. The recent success of state-space …

Tallenna Viittaa Viittausten määrä 9 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mambamixer: Efficient selective state space models with dual token and channel selection

A Behrouz, M Santacatterina, R Zabih - arxiv preprint arxiv:2403.19888, 2024 - arxiv.org

Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …

Tallenna Viittaa Viittausten määrä 32 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zamba: A compact 7b ssm hybrid model

P Glorioso, Q Anthony, Y Tokpanov… - arxiv preprint arxiv …, 2024 - arxiv.org

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

Tallenna Viittaa Viittausten määrä 28 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Griffin: Mixing gated linear recurrences with local attention for efficient language models

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

From large language models to large multimodal models: A literature review

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

xlstm: Extended long short-term memory

Learning to (learn at test time): Rnns with expressive hidden states

An empirical study of mamba-based language models

The mamba in the llama: Distilling and accelerating hybrid models

Recurrent neural networks: vanishing and exploding gradients are not the end of the story

Mambamixer: Efficient selective state space models with dual token and channel selection

Zamba: A compact 7b ssm hybrid model