A survey on visual mamba
State space models (SSM) with selection mechanisms and hardware-aware architectures,
namely Mamba, have recently shown significant potential in long-sequence modeling. Since …
namely Mamba, have recently shown significant potential in long-sequence modeling. Since …
Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Is mamba effective for time series forecasting?
In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern
and distill hidden patterns within historical time series data to forecast future states …
and distill hidden patterns within historical time series data to forecast future states …
A survey of mamba
As one of the most representative DL techniques, Transformer architecture has empowered
numerous advanced models, especially the large language models (LLMs) that comprise …
numerous advanced models, especially the large language models (LLMs) that comprise …
The hidden attention of mamba models
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective
in modeling multiple domains including NLP, long-range sequences processing, and …
in modeling multiple domains including NLP, long-range sequences processing, and …
Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning
Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …
hinders its extension to long sequence and makes a burden on limited computational …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
State space model for new-generation network alternative to transformers: A survey
In the post-deep learning era, the Transformer architecture has demonstrated its powerful
performance across pre-trained big models and various downstream tasks. However, the …
performance across pre-trained big models and various downstream tasks. However, the …
Zamba: A Compact 7B SSM Hybrid Model
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …
achieves competitive performance against leading open-weight models at a comparable …
Inference optimization of foundation models on ai accelerators
Powerful foundation models, including large language models (LLMs), with Transformer
architectures have ushered in a new era of Generative AI across various industries. Industry …
architectures have ushered in a new era of Generative AI across various industries. Industry …