Visual mamba: A survey and new outlooks

R Xu, S Yang, Y Wang, Y Cai, B Du, H Chen - arxiv preprint arxiv …, 2024 - arxiv.org
Mamba, a recent selective structured state space model, excels in long sequence modeling,
which is vital in the large model era. Long sequence modeling poses significant challenges …

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Dynamic diffusion transformer

W Zhao, Y Han, J Tang, K Wang, Y Song… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion Transformer (DiT), an emerging diffusion model for image generation, has
demonstrated superior performance but suffers from substantial computational costs. Our …

Linfusion: 1 gpu, 1 minute, 16k image

S Liu, W Yu, Z Tan, X Wang - arxiv preprint arxiv:2409.02097, 2024 - arxiv.org
Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …

Fitv2: Scalable and improved flexible vision transformer for diffusion model

ZD Wang, Z Lu, D Huang, C Zhou, W Ouyang - arxiv preprint arxiv …, 2024 - arxiv.org
\textit {Nature is infinitely resolution-free}. In the context of this reality, existing diffusion
models, such as Diffusion Transformers, often face challenges when processing image …

Mamba as decision maker: Exploring multi-scale sequence modeling in offline reinforcement learning

J Cao, Q Zhang, Z Wang, J Sun, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Sequential modeling has demonstrated remarkable capabilities in offline reinforcement
learning (RL), with Decision Transformer (DT) being one of the most notable …

Mamba in vision: A comprehensive survey of techniques and applications

MM Rahman, AA Tutul, A Nath, L Laishram… - arxiv preprint arxiv …, 2024 - arxiv.org
Mamba is emerging as a novel approach to overcome the challenges faced by
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision …

Venturing into uncharted waters: The navigation compass from transformer to mamba

Y Zou, Y Chen, Z Li, L Zhang, H Zhao - arxiv preprint arxiv:2406.16722, 2024 - arxiv.org
Transformer, a deep neural network architecture, has long dominated the field of natural
language processing and beyond. Nevertheless, the recent introduction of Mamba …

Maskmamba: A hybrid mamba-transformer model for masked image generation

W Chen, L Niu, Z Lu, F Meng, J Zhou - arxiv preprint arxiv:2409.19937, 2024 - arxiv.org
Image generation models have encountered challenges related to scalability and quadratic
complexity, primarily due to the reliance on Transformer-based backbones. In this study, we …

Autoregressive Models in Vision: A Survey

J **ong, G Liu, L Huang, C Wu, T Wu, Y Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …