Pruning and quantization for deep neural network acceleration: A survey
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …
abilities in the field of computer vision. However, complex network architectures challenge …
Improving neural network quantization without retraining using outlier channel splitting
Quantization can improve the execution latency and energy efficiency of neural networks on
both commodity GPUs and specialized accelerators. The majority of existing literature …
both commodity GPUs and specialized accelerators. The majority of existing literature …
On-device learning systems for edge intelligence: A software and hardware synergy perspective
Modern machine learning (ML) applications are often deployed in the cloud environment to
exploit the computational power of clusters. However, this in-cloud computing scheme …
exploit the computational power of clusters. However, this in-cloud computing scheme …
Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to
perform poorly at very low precision as their shrunken dynamic ranges cannot adequately …
perform poorly at very low precision as their shrunken dynamic ranges cannot adequately …
Training and inference of large language models using 8-bit floating point
FP8 formats are gaining popularity to boost the computational efficiency for training and
inference of large deep learning models. Their main challenge is that a careful choice of …
inference of large deep learning models. Their main challenge is that a careful choice of …
Low-precision floating-point arithmetic for high-performance fpga-based cnn acceleration
Low-precision data representation is important to reduce storage size and memory access
for convolutional neural networks (CNNs). Yet, existing methods have two major …
for convolutional neural networks (CNNs). Yet, existing methods have two major …
Fighting quantization bias with bias
Low-precision representation of deep neural networks (DNNs) is critical for efficient
deployment of deep learning application on embedded platforms, however, converting the …
deployment of deep learning application on embedded platforms, however, converting the …
Optimizing FPGA-Based DNN accelerator with shared exponential floating-point format
In recent years, low-precision fixed-point computation has become a widely used technique
for neural network inference on FPGAs. However, this approach has some limitations, as …
for neural network inference on FPGAs. However, this approach has some limitations, as …
3D-ReG: A 3D ReRAM-based heterogeneous architecture for training deep neural networks
Deep neural network (DNN) models are being expanded to a broader range of applications.
The computational capability of traditional hardware platforms cannot accommodate the …
The computational capability of traditional hardware platforms cannot accommodate the …
Vru pose-ssd: Multiperson pose estimation for automated driving
We present a fast and efficient approach for joint person detection and pose estimation
optimized for automated driving (AD) in urban scenarios. We use a multitask weight sharing …
optimized for automated driving (AD) in urban scenarios. We use a multitask weight sharing …