[HTML][HTML] A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022‏ - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization

J Kim, JH Lee, S Kim, J Park, KM Yoo… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …

Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation

Z Liu, KT Cheng, D Huang… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
The nonuniform quantization strategy for compressing neural networks usually achieves
better performance than its counterpart, ie, uniform strategy, due to its superior …

RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!

T Andrulis, JS Emer, V Sze - … of the 50th Annual International Symposium …, 2023‏ - dl.acm.org
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …

Scalable and programmable neural network inference accelerator based on in-memory computing

H Jia, M Ozatay, Y Tang, H Valavi… - IEEE Journal of Solid …, 2021‏ - ieeexplore.ieee.org
This work demonstrates a programmable in-memory-computing (IMC) inference accelerator
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …

Flexround: Learnable rounding based on element-wise division for post-training quantization

JH Lee, J Kim, SJ Kwon, D Lee - … Conference on Machine …, 2023‏ - proceedings.mlr.press
Post-training quantization (PTQ) has been gaining popularity for the deployment of deep
neural networks on resource-limited devices since unlike quantization-aware training …

Learnable companding quantization for accurate low-bit neural networks

K Yamamoto - Proceedings of the IEEE/CVF conference on …, 2021‏ - openaccess.thecvf.com
Quantizing deep neural networks is an effective method for reducing memory consumption
and improving inference speed, and is thus useful for implementation in resource …

Q-vit: Fully differentiable quantization for vision transformer

Z Li, T Yang, P Wang, J Cheng - arxiv preprint arxiv:2201.07703, 2022‏ - arxiv.org
In this paper, we propose a fully differentiable quantization method for vision transformer
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …

Dkm: Differentiable k-means clustering layer for neural network compression

M Cho, KA Vahid, S Adya, M Rastegari - arxiv preprint arxiv:2108.12659, 2021‏ - arxiv.org
Deep neural network (DNN) model compression for efficient on-device inference is
becoming increasingly important to reduce memory requirements and keep user data on …

Improving low-precision network quantization via bin regularization

T Han, D Li, J Liu, L Tian… - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com
Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …