Model quantization and hardware acceleration for vision transformers: A comprehensive survey

D Du, G Gong, X Chu - arxiv preprint arxiv:2405.00314, 2024 - arxiv.org
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …

Efficient multimodal large language models: A survey

Y **, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …

Outlier-aware slicing for post-training quantization in vision transformer

Y Ma, H Li, X Zheng, F Ling, X **ao… - … on Machine Learning, 2024 - openreview.net
Post-Training Quantization (PTQ) is a vital technique for network compression and
acceleration, gaining prominence as model sizes increase. This paper addresses a critical …

Erq: Error reduction for post-training quantization of vision transformers

Y Zhong, J Hu, Y Huang, Y Zhang, R Ji - arxiv preprint arxiv:2407.06794, 2024 - arxiv.org
Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant
attention due to its efficiency in compressing models. However, existing methods typically …

I&s-vit: An inclusive & stable method for pushing the limit of post-training vits quantization

Y Zhong, J Hu, M Lin, M Chen, R Ji - arxiv preprint arxiv:2311.10126, 2023 - arxiv.org
Albeit the scalable performance of vision transformers (ViTs), the dense computational costs
(training & inference) undermine their position in industrial applications. Post-training …

Data quality-aware mixed-precision quantization via hybrid reinforcement learning

Y Wang, S Guo, J Guo, Y Zhang… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Mixed-precision quantization mostly predetermines the model bit-width settings before
actual training due to the non-differential bit-width sampling process, obtaining suboptimal …

Magr: Weight magnitude reduction for enhancing post-training quantization

A Zhang, N Wang, Y Deng, X Li, Z Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a simple optimization-based preprocessing technique called
Weight Magnitude Reduction (MagR) to improve the performance of post-training …

Comq: A backpropagation-free algorithm for post-training quantization

A Zhang, Z Yang, N Wang, Y Qi, J **n, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Post-training quantization (PTQ) has emerged as a practical approach to compress large
neural networks, making them highly efficient for deployment. However, effectively reducing …

[HTML][HTML] Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks

H Wei, Z Wang, Y Ni - Remote Sensing, 2024 - mdpi.com
Convolutional neural network (CNN)-based synthetic aperture radar (SAR) ship detection
models operating directly on satellites can reduce transmission latency and improve real …

MetaAug: Meta-data Augmentation for Post-training Quantization

C Pham, AD Hoang, CC Nguyen, T Le, D Phung… - … on Computer Vision, 2024 - Springer
Abstract Post-Training Quantization (PTQ) has received significant attention because it
requires only a small set of calibration data to quantize a full-precision model, which is more …