[HTML][HTML] Deep learning attention mechanism in medical image analysis: Basics and beyonds

X Li, M Li, P Yan, G Li, Y Jiang, H Luo… - International Journal of …, 2023 - sciltp.com
With the improvement of hardware computing power and the development of deep learning
algorithms, a revolution of" artificial intelligence (AI)+ medical image" is taking place …

Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

J Chen, C Ge, E **e, Y Wu, L Yao, X Ren… - … on Computer Vision, 2024 - Springer
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Demystify mamba in vision: A linear attention perspective

D Han, Z Wang, Z **a, Y Han, Y Pu… - Advances in neural …, 2025 - proceedings.neurips.cc
Mamba is an effective state space model with linear computation complexity. It has recently
shown impressive efficiency in dealing with high-resolution inputs across various vision …

Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z **a, S Pan, P Wan… - … on Computer Vision, 2024 - Springer
The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …

Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation

M Heidari, A Kazerouni, M Soltany… - Proceedings of the …, 2023 - openaccess.thecvf.com
Convolutional neural networks (CNNs) have been the consensus for medical image
segmentation tasks. However, they inevitably suffer from the limitation in modeling long …