Efficient deep learning: A survey on making deep learning models smaller, faster, and better
G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …
understanding, speech recognition, information retrieval, and more. However, with the …
Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
{TVM}: An automated {End-to-End} optimizing compiler for deep learning
There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …
Hawq-v3: Dyadic neural network quantization
Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …
and forth from floating point to quantized integer values. This hidden cost limits the latency …
Learning to optimize tensor programs
We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …
Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning
With the emergence of a spectrum of high-end mobile devices, many applications that
formerly required desktop-level computation capability are being transferred to these …
formerly required desktop-level computation capability are being transferred to these …
Ansor: Generating {High-Performance} tensor programs for deep learning
High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …
neural networks. However, obtaining performant tensor programs for different operators on …
Taichi: a language for high-performance computation on spatially sparse data structures
3D visual computing data are often spatially sparse. To exploit such sparsity, people have
developed hierarchical sparse data structures, such as multi-level sparse voxel grids …
developed hierarchical sparse data structures, such as multi-level sparse voxel grids …
Tiramisu: A polyhedral compiler for expressing fast and portable code
R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …
performance code for multiple platforms including multicores, GPUs, and distributed …
Glow: Graph lowering compiler techniques for neural networks
N Rotem, J Fix, S Abdulrasool, G Catron… - arxiv preprint arxiv …, 2018 - arxiv.org
This paper presents the design of Glow, a machine learning compiler for heterogeneous
hardware. It is a pragmatic approach to compilation that enables the generation of highly …
hardware. It is a pragmatic approach to compilation that enables the generation of highly …