Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com
To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

Designing network design strategies through gradient path analysis

CY Wang, HYM Liao, IH Yeh - arxiv preprint arxiv:2211.04800, 2022 - arxiv.org
Designing a high-efficiency and high-quality expressive network architecture has always
been the most important research topic in the field of deep learning. Most of today's network …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation

N Zhang, F Nex, G Vosselman… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised monocular depth estimation that does not require ground truth for training
has attracted attention in recent years. It is of high interest to design lightweight but effective …

Scale-aware modulation meet transformer

W Lin, Z Wu, J Chen, J Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents a new vision Transformer, Scale Aware Modulation Transformer (SMT),
that can handle various downstream tasks efficiently by combining the convolutional network …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications

A Shaker, M Maaz, H Rasheed… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-attention has become a defacto choice for capturing global context in various vision
applications. However, its quadratic computational complexity with respect to image …