- Academic Search

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation‏

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024‏ - dl.acm.org‏

This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …‏

שמור צטט צוטט על ידי 443 מאמרים בנושא זה כל 6 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification‏

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - Proceedings of the 29th …, 2024‏ - dl.acm.org‏

This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …‏

שמור צטט צוטט על ידי 88 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization‏

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023‏ - dl.acm.org‏

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …‏

שמור צטט צוטט על ידי 93 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

[PDF][PDF] Specinfer: Accelerating generative llm serving with speculative inference and token tree verification‏

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - ar** high-performance sparse operators can be difficult and …‏

שמור צטט צוטט על ידי 84 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Welder: Scheduling deep learning memory access via tile-graph‏

Y Shi, Z Yang, J Xue, L Ma, Y **a, Z Miao… - … USENIX Symposium on …, 2023‏ - usenix.org‏

With the growing demand for processing higher fidelity data and the use of faster computing
cores in newer hardware accelerators, modern deep neural networks (DNNs) are becoming …‏

שמור צטט צוטט על ידי 32 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Ansor: Generating {High-Performance} tensor programs for deep learning

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation‏

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification‏

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization‏

[PDF][PDF] Specinfer: Accelerating generative llm serving with speculative inference and token tree verification‏

Welder: Scheduling deep learning memory access via tile-graph‏