Bolt: Bridging the gap between auto-tuners and hardware-native performance

J **ng, L Wang, S Zhang, J Chen… - … of Machine Learning …, 2022 - proceedings.mlsys.org
Today's auto-tuners (eg, AutoTVM, Ansor) generate efficient tensor programs by navigating
a large search space to identify effective implementations, but they do so with opaque …

ALT: Breaking the wall between data layout and loop optimizations for deep learning compilation

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - Proceedings of the …, 2023 - dl.acm.org
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

M Li, H Yang, S Zhang, F Yu, R Gong, Y Liu… - Proceedings of the …, 2023 - dl.acm.org
The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …

Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter

S Xu, A Shafi, H Subramoni… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Recent advances in HPC Cloud field has made multi-core high performance VM services
more accessible. Emerging Arm based HPC systems are also receiving more attention …

Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

H Huang, X Chen, J Zhao - Proceedings of the 38th ACM International …, 2024 - dl.acm.org
With the growing importance of deploying deep neural networks (DNNs), there are
increasing demands to improve both the efficiency and quality of tensor program …

Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters

D Borowiec, G Yeung, A Friday… - 2022 IEEE 15th …, 2022 - ieeexplore.ieee.org
Cloud datacenters capable of provisioning high performance Machine Learning-as-a-
Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor …

To Share or Hide: Confidential Model Compilation as a Service with Privacy-Preserving Transparency

K Qin, D Gu - 2024 43rd International Symposium on Reliable …, 2024 - ieeexplore.ieee.org
Model Compilation as a Service (MCaaS) has emerged as critical Machine Learning (ML)
supply chain infrastructure. It provides large-scale model optimization for heteroge-neous …

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

YS Hsieh, YP You - Journal of Signal Processing Systems, 2023 - Springer
With the rapid growth of deep learning models and deep learning-based applications, how
to accelerate the inference of deep neural networks, especially neural network operators …

Towards AI-Assisted Programming: Automation, Comprehensiveness, and Optimization

H Huang - 2024 - search.proquest.com
The rapid advancement of computing technologies presents increasingly complex
challenges in software and hardware development. Traditional programming approaches …