- Academic Search

J **ng, L Wang, S Zhang, J Chen… - … of Machine Learning …, 2022 - proceedings.mlsys.org

Today's auto-tuners (eg, AutoTVM, Ansor) generate efficient tensor programs by navigating
a large search space to identify effective implementations, but they do so with opaque …

Save Cite Cited by 42 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] github.io

ALT: Breaking the wall between data layout and loop optimizations for deep learning compilation

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - Proceedings of the …, 2023 - dl.acm.org

Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

M Li, H Yang, S Zhang, F Yu, R Gong, Y Liu… - Proceedings of the …, 2023 - dl.acm.org

The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …

Save Cite Cited by 2 Related articles

[Free GPT-4]

[PDF] nsf.gov

Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter

S Xu, A Shafi, H Subramoni… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Recent advances in HPC Cloud field has made multi-core high performance VM services
more accessible. Emerging Arm based HPC systems are also receiving more attention …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

H Huang, X Chen, J Zhao - Proceedings of the 38th ACM International …, 2024 - dl.acm.org

With the growing importance of deploying deep neural networks (DNNs), there are
increasing demands to improve both the efficiency and quality of tensor program …

[Free GPT-4]

[PDF] google.com

Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters

D Borowiec, G Yeung, A Friday… - 2022 IEEE 15th …, 2022 - ieeexplore.ieee.org

Cloud datacenters capable of provisioning high performance Machine Learning-as-a-
Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4

To Share or Hide: Confidential Model Compilation as a Service with Privacy-Preserving Transparency

K Qin, D Gu - 2024 43rd International Symposium on Reliable …, 2024 - ieeexplore.ieee.org

Model Compilation as a Service (MCaaS) has emerged as critical Machine Learning (ML)
supply chain infrastructure. It provides large-scale model optimization for heteroge-neous …

[Free GPT-4]

[PDF] arxiv.org

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arxiv preprint arxiv …, 2022 - arxiv.org

Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

YS Hsieh, YP You - Journal of Signal Processing Systems, 2023 - Springer

With the rapid growth of deep learning models and deep learning-based applications, how
to accelerate the inference of deep neural networks, especially neural network operators …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] escholarship.org

Towards AI-Assisted Programming: Automation, Comprehensiveness, and Optimization

H Huang - 2024 - search.proquest.com

The rapid advancement of computing technologies presents increasingly complex
challenges in software and hardware development. Traditional programming approaches …

Create alert

Cite

Advanced search

Saved to My library

Lorien: Efficient deep learning workloads delivery

Bolt: Bridging the gap between auto-tuners and hardware-native performance

ALT: Breaking the wall between data layout and loop optimizations for deep learning compilation

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter

Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters

To Share or Hide: Confidential Model Compilation as a Service with Privacy-Preserving Transparency

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

Towards AI-Assisted Programming: Automation, Comprehensiveness, and Optimization