A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer
Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

Tinyvit: Fast pretraining distillation for small vision transformers

K Wu, J Zhang, H Peng, M Liu, B **ao, J Fu… - European conference on …, 2022 - Springer
Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …

Patch n'pack: Navit, a vision transformer for any aspect ratio and resolution

M Dehghani, B Mustafa, J Djolonga… - Advances in …, 2024 - proceedings.neurips.cc
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution
before processing them with computer vision models has not yet been successfully …

Fast model editing at scale

E Mitchell, C Lin, A Bosselut, C Finn… - arxiv preprint arxiv …, 2021 - arxiv.org
While large pre-trained models have enabled impressive results on a variety of downstream
tasks, the largest existing models still make errors, and even accurate predictions may …

Flexivit: One model for all patch sizes

L Beyer, P Izmailov, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision Transformers convert images to sequences by slicing them into patches. The size of
these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher …

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization

J Jiang, W Huang, M Zhang… - Advances in Neural …, 2025 - proceedings.neurips.cc
Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …

Expediting large-scale vision transformer for dense prediction without fine-tuning

W Liang, Y Yuan, H Ding, X Luo… - Advances in …, 2022 - proceedings.neurips.cc
Vision transformers have recently achieved competitive results across various vision tasks
but still suffer from heavy computation costs when processing a large number of tokens …