A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

Llm-qat: Data-free quantization aware training for large language models

Z Liu, B Oguz, C Zhao, E Chang, P Stock… - arxiv preprint arxiv …, 2023 - arxiv.org
Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

A comprehensive survey of compression algorithms for language models

S Park, J Choi, S Lee, U Kang - arxiv preprint arxiv:2401.15347, 2024 - arxiv.org
How can we compress language models without sacrificing accuracy? The number of
compression algorithms for language models is rapidly growing to benefit from remarkable …

Bit: Robustly binarized multi-distilled transformer

Z Liu, B Oguz, A Pappu, L **ao, S Yih… - Advances in neural …, 2022 - proceedings.neurips.cc
Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine
learning, but have also grown in parameters and computational complexity, making them …

A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …

Q-detr: An efficient low-bit quantized detection transformer

S Xu, Y Li, M Lin, P Gao, G Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent detection transformer (DETR) has advanced object detection, but its application
on resource-constrained devices requires massive computation and memory resources …

Oscillation-free quantization for low-bit vision transformers

SY Liu, Z Liu, KT Cheng - International conference on …, 2023 - proceedings.mlr.press
Weight oscillation is a by-product of quantization-aware training, in which quantized weights
frequently jump between two quantized levels, resulting in training instability and a sub …

Irgen: Generative modeling for image retrieval

Y Zhang, T Zhang, D Chen, Y Wang, Q Chen… - … on Computer Vision, 2024 - Springer
While generative modeling has become prevalent across numerous research fields, its
integration into the realm of image retrieval remains largely unexplored and underjustified …

Adapting magnetoresistive memory devices for accurate and on-chip-training-free in-memory computing

Z **ao, VB Naik, JH Lim, Y Hou, Z Wang, Q Shao - Science Advances, 2024 - science.org
Memristors have emerged as promising devices for enabling efficient multiply-accumulate
(MAC) operations in crossbar arrays, crucial for analog in-memory computing (AiMC) …