Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

Hardnet: A low memory traffic network

P Chao, CY Kao, YS Ruan… - Proceedings of the …, 2019 - openaccess.thecvf.com
State-of-the-art neural network architectures such as ResNet, MobileNet, and DenseNet
have achieved outstanding accuracy over low MACs and small model size counterparts …

Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus

L Lai, N Suda, V Chandra - ar**_regular_and_irregular_cnns_on_fpgas.pdf" data-clk="hl=en&sa=T&oi=gga&ct=gga&cd=7&d=16338695540943475719&ei=k5urZ4uNCovEieoPmurmyAY" data-clk-atid="B9QxQwe1vuIJ" target="_blank">[PDF] ic.ac.uk

fpgaConvNet: Map** regular and irregular convolutional neural networks on FPGAs

SI Venieris, CS Bouganis - IEEE transactions on neural …, 2018 - ieeexplore.ieee.org
Since neural networks renaissance, convolutional neural networks (ConvNets) have
demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks …

Transfer learning for sEMG hand gestures recognition using convolutional neural networks

U Côté-Allard, CL Fall… - … on Systems, Man …, 2017 - ieeexplore.ieee.org
In the realm of surface electromyography (sEMG) gesture recognition, deep learning
algorithms are seldom employed. This is due in part to the large quantity of data required for …

{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections

H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng… - … USENIX Symposium on …, 2021 - usenix.org
High-performance tensor programs are critical for efficiently deploying deep neural network
(DNN) models in real-world tasks. Existing frameworks optimize tensor programs by …