Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation
This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …
Sparsetir: Composable abstractions for sparse compilation in deep learning
Sparse tensors are rapidly becoming critical components of modern deep learning
workloads. However, develo** high-performance sparse operators can be difficult and …
workloads. However, develo** high-performance sparse operators can be difficult and …
Torchsparse++: Efficient training and inference framework for sparse convolution on gpus
Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …
processing in AR/VR, autonomous driving, and graph understanding in recommendation …
Torchsparse++: Efficient point cloud engine
Point cloud computation has become an increasingly more important workload thanks to its
applications in autonomous driving. Unlike dense 2D computation, point cloud convolution …
applications in autonomous driving. Unlike dense 2D computation, point cloud convolution …
Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach
Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …
machine learning models for data processing and management. With increasingly complex …
DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration
Deep Neural Networks (DNNs) are very computationally demanding, which presents a
significant barrier to their deployment, especially on resource-constrained devices …
significant barrier to their deployment, especially on resource-constrained devices …
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs
Achieving high performance in machine learning workloads is a crucial yet difficult task. To
achieve high runtime performance on hardware platforms such as GPUs, graph-based …
achieve high runtime performance on hardware platforms such as GPUs, graph-based …