Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Improving neural network quantization without retraining using outlier channel splitting

R Zhao, Y Hu, J Dotzel, C De Sa… - … conference on machine …, 2019 - proceedings.mlr.press
Quantization can improve the execution latency and energy efficiency of neural networks on
both commodity GPUs and specialized accelerators. The majority of existing literature …

On-device learning systems for edge intelligence: A software and hardware synergy perspective

Q Zhou, Z Qu, S Guo, B Luo, J Guo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
Modern machine learning (ML) applications are often deployed in the cloud environment to
exploit the computational power of clusters. However, this in-cloud computing scheme …

Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference

T Tambe, EY Yang, Z Wan, Y Deng… - 2020 57th ACM/IEEE …, 2020 - ieeexplore.ieee.org
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to
perform poorly at very low precision as their shrunken dynamic ranges cannot adequately …

Training and inference of large language models using 8-bit floating point

SP Perez, Y Zhang, J Briggs, C Blake… - arxiv preprint arxiv …, 2023 - arxiv.org
FP8 formats are gaining popularity to boost the computational efficiency for training and
inference of large deep learning models. Their main challenge is that a careful choice of …

Low-precision floating-point arithmetic for high-performance fpga-based cnn acceleration

C Wu, M Wang, X Chu, K Wang, L He - ACM Transactions on …, 2021 - dl.acm.org
Low-precision data representation is important to reduce storage size and memory access
for convolutional neural networks (CNNs). Yet, existing methods have two major …

Fighting quantization bias with bias

A Finkelstein, U Almog, M Grobman - arxiv preprint arxiv:1906.03193, 2019 - arxiv.org
Low-precision representation of deep neural networks (DNNs) is critical for efficient
deployment of deep learning application on embedded platforms, however, converting the …

Optimizing FPGA-Based DNN accelerator with shared exponential floating-point format

W Zhao, Q Dang, T **a, J Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In recent years, low-precision fixed-point computation has become a widely used technique
for neural network inference on FPGAs. However, this approach has some limitations, as …

3D-ReG: A 3D ReRAM-based heterogeneous architecture for training deep neural networks

B Li, JR Doppa, PP Pande, K Chakrabarty… - ACM Journal on …, 2020 - dl.acm.org
Deep neural network (DNN) models are being expanded to a broader range of applications.
The computational capability of traditional hardware platforms cannot accommodate the …

Vru pose-ssd: Multiperson pose estimation for automated driving

C Kumar, J Ramesh, B Chakraborty, R Raman… - Proceedings of the …, 2021 - ojs.aaai.org
We present a fast and efficient approach for joint person detection and pose estimation
optimized for automated driving (AD) in urban scenarios. We use a multitask weight sharing …