Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024 - dl.acm.org
This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

Sparsetir: Composable abstractions for sparse compilation in deep learning

Z Ye, R Lai, J Shao, T Chen, L Ceze - Proceedings of the 28th ACM …, 2023 - dl.acm.org
Sparse tensors are rapidly becoming critical components of modern deep learning
workloads. However, develo** high-performance sparse operators can be difficult and …

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li… - Proceedings of the 56th …, 2023 - dl.acm.org
Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …

Torchsparse++: Efficient point cloud engine

H Tang, S Yang, Z Liu, K Hong, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Point cloud computation has become an increasingly more important workload thanks to its
applications in autonomous driving. Unlike dense 2D computation, point cloud convolution …

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org
Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration

P Gibson, J Cano, E Crowley, A Storkey… - ACM Transactions on …, 2024 - dl.acm.org
Deep Neural Networks (DNNs) are very computationally demanding, which presents a
significant barrier to their deployment, especially on resource-constrained devices …

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou… - ar** programming paradigm for deep learning tensor programs
Y Ding, CH Yu, B Zheng, Y Liu, Y Wang… - Proceedings of the 28th …, 2023 - dl.acm.org
As deep learning models nowadays are widely adopted by both cloud services and edge
devices, reducing the latency of deep learning model inferences becomes crucial to provide …

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs

B Zheng, CH Yu, J Wang, Y Ding, Y Liu… - Proceedings of the 56th …, 2023 - dl.acm.org
Achieving high performance in machine learning workloads is a crucial yet difficult task. To
achieve high runtime performance on hardware platforms such as GPUs, graph-based …