Efficient deep learning: A survey on making deep learning models smaller, faster, and better

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

{TVM}: An automated {End-to-End} optimizing compiler for deep learning

T Chen, T Moreau, Z Jiang, L Zheng, E Yan… - … USENIX Symposium on …, 2018 - usenix.org
There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …

Hawq-v3: Dyadic neural network quantization

Z Yao, Z Dong, Z Zheng, A Gholami… - International …, 2021 - proceedings.mlr.press
Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc
We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning

W Niu, X Ma, S Lin, S Wang, X Qian, X Lin… - Proceedings of the …, 2020 - dl.acm.org
With the emergence of a spectrum of high-end mobile devices, many applications that
formerly required desktop-level computation capability are being transferred to these …

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org
High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

Taichi: a language for high-performance computation on spatially sparse data structures

Y Hu, TM Li, L Anderson, J Ragan-Kelley… - ACM Transactions on …, 2019 - dl.acm.org
3D visual computing data are often spatially sparse. To exploit such sparsity, people have
developed hierarchical sparse data structures, such as multi-level sparse voxel grids …

Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Glow: Graph lowering compiler techniques for neural networks

N Rotem, J Fix, S Abdulrasool, G Catron… - arxiv preprint arxiv …, 2018 - arxiv.org
This paper presents the design of Glow, a machine learning compiler for heterogeneous
hardware. It is a pragmatic approach to compilation that enables the generation of highly …