BitWave: Exploiting column-based bit-level sparsity for deep learning acceleration

M Shi, V Jain, A Joseph, M Meijer… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Bit-serial computation facilitates bit-wise sequential data processing, offering numerous
benefits, such as a reduced area footprint and dynamically-adaptive computational …

Optimus: An operator fusion framework for deep neural networks

X Cai, Y Wang, L Zhang - ACM Transactions on Embedded Computing …, 2022 - dl.acm.org
The reduction of neural parameters and operations for the applications on embedded and
IoT platforms in current deep neural network (DNN) architectures has received increasing …

Cascading structured pruning: enabling high data reuse for sparse dnn accelerators

E Hanson, S Li, HH Li, Y Chen - Proceedings of the 49th Annual …, 2022 - dl.acm.org
Performance and efficiency of running modern Deep Neural Networks (DNNs) are heavily
bounded by data movement. To mitigate the data movement bottlenecks, recent DNN …

[ΒΙΒΛΙΟ][B] Low-power computer vision: improve the efficiency of artificial intelligence

GK Thiruvathukal, YH Lu, J Kim, Y Chen, B Chen - 2022 - books.google.com
Energy efficiency is critical for running computer vision on battery-powered systems, such as
mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the …

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

MZ Nisar, MS Ibrahim, S Gorgin, M Usman… - IEEE Access, 2024 - ieeexplore.ieee.org
Digit-serial arithmetic has emerged as a viable approach for designing hardware
accelerators, reducing interconnections, area utilization, and power consumption. However …

Optimus: towards optimal layer-fusion on deep learning processors

X Cai, Y Wang, L Zhang - Proceedings of the 22nd ACM SIGPLAN …, 2021 - dl.acm.org
Neural network layer fusion has been proposed to parallelize the inference of neural layers
and thus significantly reduces the feature-induced memory accesses. However, how to fuse …

ASBP: Automatic structured bit-pruning for RRAM-based NN accelerator

S Qu, B Li, Y Wang, L Zhang - 2021 58th ACM/IEEE Design …, 2021 - ieeexplore.ieee.org
Network sparsity or pruning is an extensively studied method to optimize the computation
efficiency of deep neural networks (DNNs) for CMOS-based accelerators, such as FPGAs …

Msd: Mixing signed digit representations for hardware-efficient dnn acceleration on fpga with heterogeneous resources

J Wu, J Zhou, Y Gao, Y Ding, N Wong… - 2023 IEEE 31st …, 2023 - ieeexplore.ieee.org
By quantizing weights with different precision for different parts of a network, mixed-precision
quantization promises to reduce the hardware cost and improve the speed of deep neural …

Special session: Fault-tolerant deep learning: A hierarchical perspective

C Liu, Z Gao, S Liu, X Ning, H Li… - 2022 IEEE 40th VLSI Test …, 2022 - ieeexplore.ieee.org
With the rapid advancements of deep learning in the past decade, it can be foreseen that
deep learning will be continuously deployed in more and more safety-critical applications …

Bit-balance: Model-hardware codesign for accelerating nns by exploiting bit-level sparsity

W Sun, Z Zou, D Liu, W Sun, S Chen… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Bit-serial architectures can handle Neural Networks (NNs) with different weight precision,
achieving higher resource efficiency compared with bit-parallel architectures. Besides, the …