- Academic Search

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org

Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Save Cite Cited by 438 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

Save Cite Cited by 890 Related articles All 28 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{TVM}: An automated {End-to-End} optimizing compiler for deep learning

T Chen, T Moreau, Z Jiang, L Zheng, E Yan… - … USENIX Symposium on …, 2018 - usenix.org

There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …

Save Cite Cited by 2051 Related articles All 22 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Hawq-v3: Dyadic neural network quantization

Z Yao, Z Dong, Z Zheng, A Gholami… - International …, 2021 - proceedings.mlr.press

Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …

Save Cite Cited by 280 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc

We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

Save Cite Cited by 496 Related articles All 18 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning

W Niu, X Ma, S Lin, S Wang, X Qian, X Lin… - Proceedings of the …, 2020 - dl.acm.org

With the emergence of a spectrum of high-end mobile devices, many applications that
formerly required desktop-level computation capability are being transferred to these …

Save Cite Cited by 298 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org

High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

Save Cite Cited by 428 Related articles All 16 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Taichi: a language for high-performance computation on spatially sparse data structures

Y Hu, TM Li, L Anderson, J Ragan-Kelley… - ACM Transactions on …, 2019 - dl.acm.org

3D visual computing data are often spatially sparse. To exploit such sparsity, people have
developed hierarchical sparse data structures, such as multi-level sparse voxel grids …

Save Cite Cited by 334 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Save Cite Cited by 398 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Glow: Graph lowering compiler techniques for neural networks

N Rotem, J Fix, S Abdulrasool, G Catron… - arxiv preprint arxiv …, 2018 - arxiv.org

This paper presents the design of Glow, a machine learning compiler for heterogeneous
hardware. It is a pragmatic approach to compilation that enables the generation of highly …

Save Cite Cited by 350 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

Efficient deep learning: A survey on making deep learning models smaller, faster, and better

Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

{TVM}: An automated {End-to-End} optimizing compiler for deep learning

Hawq-v3: Dyadic neural network quantization

Learning to optimize tensor programs

Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning

Ansor: Generating {High-Performance} tensor programs for deep learning

Taichi: a language for high-performance computation on spatially sparse data structures

Tiramisu: A polyhedral compiler for expressing fast and portable code

Glow: Graph lowering compiler techniques for neural networks