- Academic Search

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024 - dl.acm.org

This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

Save Cite Cited by 402 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Sparsetir: Composable abstractions for sparse compilation in deep learning

Z Ye, R Lai, J Shao, T Chen, L Ceze - Proceedings of the 28th ACM …, 2023 - dl.acm.org

Sparse tensors are rapidly becoming critical components of modern deep learning
workloads. However, develo** high-performance sparse operators can be difficult and …

Save Cite Cited by 82 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li… - Proceedings of the 56th …, 2023 - dl.acm.org

Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …

Save Cite Cited by 21 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Torchsparse++: Efficient point cloud engine

H Tang, S Yang, Z Liu, K Hong, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Point cloud computation has become an increasingly more important workload thanks to its
applications in autonomous driving. Unlike dense 2D computation, point cloud convolution …

Save Cite Cited by 14 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] github.io

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org

Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

Save Cite Cited by 12 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration

P Gibson, J Cano, E Crowley, A Storkey… - ACM Transactions on …, 2024 - dl.acm.org

Deep Neural Networks (DNNs) are very computationally demanding, which presents a
significant barrier to their deployment, especially on resource-constrained devices …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou… - ar** programming paradigm for deep learning tensor programs

Y Ding, CH Yu, B Zheng, Y Liu, Y Wang… - Proceedings of the 28th …, 2023 - dl.acm.org

As deep learning models nowadays are widely adopted by both cloud services and edge
devices, reducing the latency of deep learning model inferences becomes crucial to provide …

Save Cite Cited by 28 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs

B Zheng, CH Yu, J Wang, Y Ding, Y Liu… - Proceedings of the 56th …, 2023 - dl.acm.org

Achieving high performance in machine learning workloads is a crucial yet difficult task. To
achieve high runtime performance on hardware platforms such as GPUs, graph-based …

Save Cite Cited by 2 Related articles All 5 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

DietCode: Automatic optimization for dynamic tensor programs

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

Sparsetir: Composable abstractions for sparse compilation in deep learning

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

Torchsparse++: Efficient point cloud engine

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs