Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020‏ - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

Normalization techniques in training dnns: Methodology, analysis and application

L Huang, J Qin, Y Zhou, F Zhu, L Liu… - IEEE transactions on …, 2023‏ - ieeexplore.ieee.org
Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022‏ - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021‏ - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Nerv: Neural representations for videos

H Chen, B He, H Wang, Y Ren… - Advances in Neural …, 2021‏ - proceedings.neurips.cc
We propose a novel neural representation for videos (NeRV) which encodes videos in
neural networks. Unlike conventional representations that treat videos as frame sequences …

{BatchCrypt}: Efficient homomorphic encryption for {Cross-Silo} federated learning

C Zhang, S Li, J **a, W Wang, F Yan, Y Liu - 2020 USENIX annual …, 2020‏ - usenix.org
Cross-silo federated learning (FL) enables organizations (eg, financial, or medical) to
collaboratively train a machine learning model by aggregating local gradient updates from …

Mlperf training benchmark

P Mattson, C Cheng, G Diamos… - Proceedings of …, 2020‏ - proceedings.mlsys.org
Abstract Machine learning is experiencing an explosion of software and hardware solutions,
and needs industry-standard performance benchmarks to drive design and enable …

Ultra-low precision 4-bit training of deep neural networks

X Sun, N Wang, CY Chen, J Ni… - Advances in …, 2020‏ - proceedings.neurips.cc
In this paper, we propose a number of novel techniques and numerical representation
formats that enable, for the very first time, the precision of training systems to be aggressively …

Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arxiv preprint arxiv:1907.10701, 2019‏ - arxiv.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

Photonic multiply-accumulate operations for neural networks

MA Nahmias, TF De Lima, AN Tait… - IEEE Journal of …, 2019‏ - ieeexplore.ieee.org
It has long been known that photonic communication can alleviate the data movement
bottlenecks that plague conventional microelectronic processors. More recently, there has …