A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation

Q Wan, Z Huang, J Lu, YU Gang… - The eleventh international …, 2023 - openreview.net
Since the introduction of Vision Transformers, the landscape of many computer vision tasks
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …

Transformer meets remote sensing video detection and tracking: A comprehensive survey

L Jiao, X Zhang, X Liu, F Liu, S Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …

Transflow: Transformer as flow learner

Y Lu, Q Wang, S Ma, T Geng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Optical flow is an indispensable building block for various important computer vision tasks,
including motion estimation, object tracking, and disparity measurement. In this work, we …

Omni aggregation networks for lightweight image super-resolution

H Wang, X Chen, B Ni, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
While lightweight ViT framework has made tremendous progress in image super-resolution,
its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme …

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features

SN Wadekar, A Chaurasia - arxiv preprint arxiv:2209.15159, 2022 - arxiv.org
MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision
transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main …

Hydra attention: Efficient attention with many heads

D Bolya, CY Fu, X Dai, P Zhang, J Hoffman - European Conference on …, 2022 - Springer
While transformers have begun to dominate many tasks in vision, applying them to large
images is still computationally difficult. A large reason for this is that self-attention scales …

SeaFormer++: Squeeze-enhanced axial transformer for mobile visual recognition

Q Wan, Z Huang, J Lu, G Yu, L Zhang - International Journal of Computer …, 2025 - Springer
Since the introduction of Vision Transformers, the landscape of many computer vision tasks
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …