A survey on vision mamba: Models, applications and challenges
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …
Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Jamba: A hybrid transformer-mamba language model
We present Jamba, a new base large language model based on a novel hybrid Transformer-
Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of …
Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of …
A survey of mamba
As one of the most representative DL techniques, Transformer architecture has empowered
numerous advanced models, especially the large language models (LLMs) that comprise …
numerous advanced models, especially the large language models (LLMs) that comprise …
Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
Repeat after me: Transformers are better than state space models at copying
Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …
interest in models that use a fixed-size latent state that does not depend on the sequence …
Integration of Mamba and Transformer-MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
W Zhang, J Huang, R Wang, C Wei… - 2024 International …, 2024 - ieeexplore.ieee.org
Long-short range time series forecasting is essential for predicting future trends and patterns
over extended periods. While deep learning models such as Transformers have made …
over extended periods. While deep learning models such as Transformers have made …
Is mamba capable of in-context learning?
This work provides empirical evidence that Mamba, a newly proposed selective structured
state space model, has similar in-context learning (ICL) capabilities as transformers. We …
state space model, has similar in-context learning (ICL) capabilities as transformers. We …
Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning
Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …
hinders its extension to long sequence and makes a burden on limited computational …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …