Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Transformers in medical imaging: A survey

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

Efficient multi-scale attention module with cross-spatial learning

D Ouyang, S He, G Zhang, M Luo… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Remarkable effectiveness of the channel or spatial attention mechanisms for producing
more discernible feature representation are illustrated in various computer vision tasks …

Efficient and explicit modelling of image hierarchies for image restoration

Y Li, Y Fan, X **ang, D Demandolx… - Proceedings of the …, 2023 - openaccess.thecvf.com
The aim of this paper is to propose a mechanism to efficiently and explicitly model image
hierarchies in the global, regional, and local range for image restoration. To achieve that, we …

[HTML][HTML] TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

J Chen, J Mei, X Li, Y Lu, Q Yu, Q Wei, X Luo, Y **e… - Medical Image …, 2024 - Elsevier
Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-
Net face limitations in modeling long-range dependencies. To address this, Transformers …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Masked autoencoders as spatiotemporal learners

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022 - proceedings.neurips.cc
This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …

GhostNetv2: Enhance cheap operation with long-range attention

Y Tang, K Han, J Guo, C Xu, C Xu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Light-weight convolutional neural networks (CNNs) are specially designed for applications
on mobile devices with faster inference speed. The convolutional operation can only capture …