A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Efficient multimodal large language models: A survey

Y **, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

MixMAE: Mixed and masked autoencoder for efficient pretraining of hierarchical vision transformers

J Liu, X Huang, J Zheng, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient
pretraining method that is applicable to various hierarchical Vision Transformers. Existing …

Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices

C Tang, LL Zhang, H Jiang, J Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Neural Architecture Search (NAS) has shown promising performance in the
automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing …

Peripheral vision transformer

J Min, Y Zhao, C Luo, M Cho - Advances in Neural …, 2022 - proceedings.neurips.cc
Human vision possesses a special type of visual processing systems called peripheral
vision. Partitioning the entire visual field into multiple contour regions based on the distance …

Once for both: Single stage of importance and sparsity search for vision transformer compression

H Ye, C Yu, P Ye, R **a, Y Tang, J Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent Vision Transformer Compression (VTC) works mainly follow a two-stage
scheme where the importance score of each model unit is first evaluated or preset in each …

Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey

G Xu, X Wang, X Wu, X Leng, Y Xu - arxiv preprint arxiv:2405.01725, 2024 - arxiv.org
Deep learning has made significant progress in computer vision, specifically in image
classification, object detection, and semantic segmentation. The skip connection has played …