Bringing AI to edge: From deep learning's perspective

D Liu, H Kong, X Luo, W Liu, R Subramaniam - Neurocomputing, 2022 - Elsevier
Edge computing and artificial intelligence (AI), especially deep learning algorithms, are
gradually intersecting to build the novel system, namely edge intelligence. However, the …

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arxiv preprint arxiv …, 2021 - arxiv.org
While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

Pruning vs quantization: Which is better?

A Kuzmin, M Nagel, M Van Baalen… - Advances in neural …, 2023 - proceedings.neurips.cc
Neural network pruning and quantization techniques are almost as old as neural networks
themselves. However, to date, only ad-hoc comparisons between the two have been …

Up or down? adaptive rounding for post-training quantization

M Nagel, RA Amjad, M Van Baalen… - International …, 2020 - proceedings.mlr.press
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-
point value is the predominant approach. We find that, perhaps surprisingly, this is not the …

Overcoming oscillations in quantization-aware training

M Nagel, M Fournarakis… - International …, 2022 - proceedings.mlr.press
When training neural networks with simulated quantization, we observe that quantized
weights can, rather unexpectedly, oscillate between two grid-points. The importance of this …

Understanding and overcoming the challenges of efficient transformer quantization

Y Bondarenko, M Nagel, T Blankevoort - arxiv preprint arxiv:2109.12948, 2021 - arxiv.org
Transformer-based architectures have become the de-facto standard models for a wide
range of Natural Language Processing tasks. However, their memory footprint and high …

Ultra-low precision 4-bit training of deep neural networks

X Sun, N Wang, CY Chen, J Ni… - Advances in …, 2020 - proceedings.neurips.cc
In this paper, we propose a number of novel techniques and numerical representation
formats that enable, for the very first time, the precision of training systems to be aggressively …

A review of state-of-the-art mixed-precision neural network frameworks

M Rakka, ME Fouda, P Khargonekar… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Mixed-precision Deep Neural Networks (DNNs) provide an efficient solution for hardware
deployment, especially under resource constraints, while maintaining model accuracy …

Fp8 quantization: The power of the exponent

A Kuzmin, M Van Baalen, Y Ren… - Advances in …, 2022 - proceedings.neurips.cc
When quantizing neural networks for efficient inference, low-bit integers are the go-to format
for efficiency. However, low-bit floating point numbers have an extra degree of freedom …

Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks

X Sun, J Choi, CY Chen, N Wang… - Advances in neural …, 2019 - proceedings.neurips.cc
Reducing the numerical precision of data and computation is extremely effective in
accelerating deep learning training workloads. Towards this end, 8-bit floating point …