BitSystolic: A 26.7 TOPS/W 2b~ 8b NPU with configurable data flows for edge devices

Q Yang, H Li - IEEE Transactions on Circuits and Systems I …, 2020 - ieeexplore.ieee.org
Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand
for artificial intelligence on edge devices. Mixed-precision inference with both compressed …

FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy.

Z Li, Q Li, H Liu, Z Zhao - Computers, Materials & Continua, 2023 - search.ebscohost.com
With the continuous development of deep learning, Deep Convolutional Neural Network
(DCNN) has attracted wide attention in the industry due to its high accuracy in image …

ACiS: smart switches with application-level acceleration

P Haghi - 2023 - search.proquest.com
Network performance has contributed fundamentally to the growth of supercomputing over
the past decades. In parallel, High Performance Computing (HPC) peak performance has …

[HTML][HTML] Efficient On-Chip Learning of Multi-Layer Perceptron Based on Neuron Multiplexing Method

Z Zhang, G Wang, K Wang, B Gan, G Chen - Electronics, 2023 - mdpi.com
An efficient on-chip learning method based on neuron multiplexing is proposed in this paper
to address the limitations of traditional on-chip learning methods, including low resource …

Hybrid Multi-tile Vector Systolic Architecture for Accelerating Convolution on FPGAs

J Shah, N Rao - … IEEE International Symposium on Circuits and …, 2024 - ieeexplore.ieee.org
To enhance the efficiency of image-kernel convolution operations in convolutional neural
networks, we introduce a Vector Systolic Array Accelerator with adaptable lane-width. This …

Hybrid Processing Unit for Efficient Realization of DNN on FPGA Devices

CD Paladi, ER Thuraka - 2023 Second IEEE International …, 2024 - ieeexplore.ieee.org
Deep learning methods, applied to solve complex tasks, are increasingly in demand across
various industries. However, to reach this, the computational operations depend on …