Shiftaddvit: Mixture of multiplication primitives towards efficient vision transformer
Abstract Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. However, both the attention mechanism and …
a unified backbone for multiple vision tasks. However, both the attention mechanism and …
Model quantization and hardware acceleration for vision transformers: A comprehensive survey
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …
promising alternative to convolutional neural networks (CNNs) in several vision-related …
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning
(DL), are widely used for various computer vision tasks such as image classification, object …
(DL), are widely used for various computer vision tasks such as image classification, object …
Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models
Diffusion models have demonstrated remarkable capabilities in image synthesis and related
generative tasks. Nevertheless, their practicality for low-latency real-world applications is …
generative tasks. Nevertheless, their practicality for low-latency real-world applications is …
Efficient multimodal large language models: A survey
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …
remarkable performance in tasks such as visual question answering, visual understanding …
Vit-1.58 b: Mobile vision transformers in the 1-bit era
Vision Transformers (ViTs) have achieved remarkable performance in various image
classification tasks by leveraging the attention mechanism to process image patches as …
classification tasks by leveraging the attention mechanism to process image patches as …
A General and Efficient Training for Transformer via Token Expansion
The remarkable performance of Vision Transformers (ViTs) typically requires an extremely
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …
BiDM: Pushing the Limit of Quantization for Diffusion Models
Diffusion models (DMs) have been significantly developed and widely used in various
applications due to their excellent generative qualities. However, the expensive computation …
applications due to their excellent generative qualities. However, the expensive computation …
Understanding neural network binarization with forward and backward proximal quantizers
In neural network binarization, BinaryConnect (BC) and its variants are considered the
standard. These methods apply the sign function in their forward pass and their respective …
standard. These methods apply the sign function in their forward pass and their respective …
GSB: Group superposition binarization for vision transformer with limited training samples
Abstract Vision Transformer (ViT) has performed remarkably in various computer vision
tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from …
tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from …