Bolt: Bridging the gap between auto-tuners and hardware-native performance
Today's auto-tuners (eg, AutoTVM, Ansor) generate efficient tensor programs by navigating
a large search space to identify effective implementations, but they do so with opaque …
a large search space to identify effective implementations, but they do so with opaque …
ALT: Breaking the wall between data layout and loop optimizations for deep learning compilation
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs
The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter
Recent advances in HPC Cloud field has made multi-core high performance VM services
more accessible. Emerging Arm based HPC systems are also receiving more attention …
more accessible. Emerging Arm based HPC systems are also receiving more attention …
Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment
With the growing importance of deploying deep neural networks (DNNs), there are
increasing demands to improve both the efficiency and quality of tensor program …
increasing demands to improve both the efficiency and quality of tensor program …
Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters
Cloud datacenters capable of provisioning high performance Machine Learning-as-a-
Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor …
Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor …
To Share or Hide: Confidential Model Compilation as a Service with Privacy-Preserving Transparency
K Qin, D Gu - 2024 43rd International Symposium on Reliable …, 2024 - ieeexplore.ieee.org
Model Compilation as a Service (MCaaS) has emerged as critical Machine Learning (ML)
supply chain infrastructure. It provides large-scale model optimization for heteroge-neous …
supply chain infrastructure. It provides large-scale model optimization for heteroge-neous …
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators
YS Hsieh, YP You - Journal of Signal Processing Systems, 2023 - Springer
With the rapid growth of deep learning models and deep learning-based applications, how
to accelerate the inference of deep neural networks, especially neural network operators …
to accelerate the inference of deep neural networks, especially neural network operators …
Towards AI-Assisted Programming: Automation, Comprehensiveness, and Optimization
H Huang - 2024 - search.proquest.com
The rapid advancement of computing technologies presents increasingly complex
challenges in software and hardware development. Traditional programming approaches …
challenges in software and hardware development. Traditional programming approaches …