Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024‏ - dl.acm.org
This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - Proceedings of the 29th …, 2024‏ - dl.acm.org
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023‏ - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Welder: Scheduling deep learning memory access via tile-graph

Y Shi, Z Yang, J Xue, L Ma, Y **a, Z Miao… - … USENIX Symposium on …, 2023‏ - usenix.org
With the growing demand for processing higher fidelity data and the use of faster computing
cores in newer hardware accelerators, modern deep neural networks (DNNs) are becoming …