A survey on vision mamba: Models, applications and challenges

R Xu, S Yang, Y Wang, B Du, H Chen - arxiv preprint arxiv:2404.18861, 2024 - arxiv.org
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arxiv preprint arxiv:2404.16112, 2024 - arxiv.org
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Jamba: A hybrid transformer-mamba language model

O Lieber, B Lenz, H Bata, G Cohen, J Osin… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Jamba, a new base large language model based on a novel hybrid Transformer-
Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of …

A survey of mamba

H Qu, L Ning, R An, W Fan, T Derr, H Liu, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
As one of the most representative DL techniques, Transformer architecture has empowered
numerous advanced models, especially the large language models (LLMs) that comprise …

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

Integration of Mamba and Transformer-MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics

W Zhang, J Huang, R Wang, C Wei… - 2024 International …, 2024 - ieeexplore.ieee.org
Long-short range time series forecasting is essential for predicting future trends and patterns
over extended periods. While deep learning models such as Transformers have made …

Is mamba capable of in-context learning?

R Grazzi, J Siems, S Schrodi, T Brox… - arxiv preprint arxiv …, 2024 - arxiv.org
This work provides empirical evidence that Mamba, a newly proposed selective structured
state space model, has similar in-context learning (ICL) capabilities as transformers. We …

Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning

Q He, J Zhang, J Peng, H He, X Li, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …