- Academic Search

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org

High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache …

Speichern Zitieren Zitiert von: 1271 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Flexgen: High-throughput generative inference of large language models with a single gpu

Y Sheng, L Zheng, B Yuan, Z Li… - International …, 2023 - proceedings.mlr.press

The high computational and memory requirements of large language model (LLM) inference
make it feasible only with multiple high-end accelerators. Motivated by the emerging …

Speichern Zitieren Zitiert von: 339 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Speichern Zitieren Zitiert von: 30 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management

W Lee, J Lee, J Seo, J Sim - 18th USENIX Symposium on Operating …, 2024 - usenix.org

Transformer-based large language models (LLMs) demonstrate impressive performance
across various natural language processing tasks. Serving LLM inference for generating …

Speichern Zitieren Zitiert von: 44 Ähnliche Artikel HTML-Version

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Speichern Zitieren Zitiert von: 941 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org

In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

Speichern Zitieren Zitiert von: 348 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{Zero-offload}: Democratizing {billion-scale} model training

J Ren, S Rajbhandari, RY Aminabadi… - 2021 USENIX Annual …, 2021 - usenix.org

Large-scale model training has been a playing ground for a limited few requiring complex
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …

Speichern Zitieren Zitiert von: 406 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning

A Qiao, SK Choe, SJ Subramanya… - … on Operating Systems …, 2021 - usenix.org

Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-
optimizing inter-dependent factors both at the per-job level and at the cluster-wide level …

Speichern Zitieren Zitiert von: 209 Ähnliche Artikel Alle 15 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li… - … USENIX Symposium on …, 2024 - usenix.org

With the growing model size of deep neural networks (DNN), deep learning training is
increasingly relying on handcrafted search spaces to find efficient parallelization execution …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swap**

Efficient memory management for large language model serving with pagedattention

Flexgen: High-throughput generative inference of large language models with a single gpu

Enabling resource-efficient aiot system with cross-level optimization: A survey

{InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management

A survey on scheduling techniques in computing and network convergence

[HTML][HTML] Pre-trained models: Past, present and future

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

{Zero-offload}: Democratizing {billion-scale} model training

Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training