Shiftaddvit: Mixture of multiplication primitives towards efficient vision transformer

H You, H Shi, Y Guo, Y Lin - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. However, both the attention mechanism and …

Model quantization and hardware acceleration for vision transformers: A comprehensive survey

D Du, G Gong, X Chu - arxiv preprint arxiv:2405.00314, 2024 - arxiv.org
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

A Younesi, M Ansari, M Fazli, A Ejlali, M Shafique… - IEEE …, 2024 - ieeexplore.ieee.org
In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning
(DL), are widely used for various computer vision tasks such as image classification, object …

Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models

Y He, J Liu, W Wu, H Zhou, B Zhuang - arxiv preprint arxiv:2310.03270, 2023 - arxiv.org
Diffusion models have demonstrated remarkable capabilities in image synthesis and related
generative tasks. Nevertheless, their practicality for low-latency real-world applications is …

Efficient multimodal large language models: A survey

Y **, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …

Vit-1.58 b: Mobile vision transformers in the 1-bit era

Z Yuan, R Zhou, H Wang, L He, Y Ye, L Sun - arxiv preprint arxiv …, 2024 - arxiv.org
Vision Transformers (ViTs) have achieved remarkable performance in various image
classification tasks by leveraging the attention mechanism to process image patches as …

A General and Efficient Training for Transformer via Token Expansion

W Huang, Y Shen, J **e, B Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The remarkable performance of Vision Transformers (ViTs) typically requires an extremely
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …

BiDM: Pushing the Limit of Quantization for Diffusion Models

X Zheng, X Liu, Y Bian, X Ma, Y Zhang, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models (DMs) have been significantly developed and widely used in various
applications due to their excellent generative qualities. However, the expensive computation …

Understanding neural network binarization with forward and backward proximal quantizers

Y Lu, Y Yu, X Li, V Partovi Nia - Advances in Neural …, 2023 - proceedings.neurips.cc
In neural network binarization, BinaryConnect (BC) and its variants are considered the
standard. These methods apply the sign function in their forward pass and their respective …

GSB: Group superposition binarization for vision transformer with limited training samples

T Gao, CZ Xu, L Zhang, H Kong - Neural Networks, 2024 - Elsevier
Abstract Vision Transformer (ViT) has performed remarkably in various computer vision
tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from …