Dnnfusion: accelerating deep neural networks execution with advanced operator fusion

W Niu, J Guan, Y Wang, G Agrawal, B Ren - Proceedings of the 42nd …, 2021 - dl.acm.org
Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …

Guided equality saturation

T Koehler, A Goens, S Bhat, T Grosser… - Proceedings of the …, 2024 - dl.acm.org
Rewriting is a principled term transformation technique with uses across theorem proving
and compilation. In theorem proving, each rewrite is a proof step; in compilation, rewrites …

Optimizing Direct Convolutions on ARM Multi-Cores

P Wang, W Yang, J Fang, D Dong, C Huang… - Proceedings of the …, 2023 - dl.acm.org
Convolution kernels are widely seen in deep learning workloads and are often responsible
for performance bottlenecks. Recent research has demonstrated that a direct convolution …

Neural architecture search as program transformation exploration

J Turner, EJ Crowley, MFP O'Boyle - Proceedings of the 26th ACM …, 2021 - dl.acm.org
Improving the performance of deep neural networks (DNNs) is important to both the compiler
and neural architecture search (NAS) communities. Compilers apply program …

mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices

J Park, K Bin, K Lee - Proceedings of the 20th Annual International …, 2022 - dl.acm.org
The convolution layer is the key building block in many neural network designs. Most high-
performance implementations of the convolution operation rely on GEMM (General Matrix …

cuConv: CUDA implementation of convolution for CNN inference

M Jordà, P Valero-Lara, AJ Peña - Cluster Computing, 2022 - Springer
Convolutions are the core operation of deep learning applications based on Convolutional
Neural Networks (CNNs). Current GPU architectures are highly efficient for training and …

High performance dilated convolutions on multi-core DSPs

Y Wang, Q Wang, X Pei, S Mei, R Li, J Liu - CCF Transactions on High …, 2024 - Springer
Dilated convolutions are widely used to accomplish wide receptive fields while kee** the
resolution of feature maps in deep learning applications, such as semantic segmentation …

Map** parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs

N Mogers, L Li, V Radu, C Dubach - Proceedings of the 31st ACM …, 2022 - dl.acm.org
Graphics Processing Units (GPUs) are notoriously hard to optimize for manually. What is
needed are good automatic code generators and optimizers. Accelerate, Futhark and Lift …

[PDF][PDF] Sketch-Guided Equality Saturation

T Koehler, P Trinder, M Steuwer - 2022 - thok.eu
Equality saturation is a technique for implementing rewritedriven compiler optimizations by
efficiently representing many equivalent programs in so-called e-graphs. To improve …