- Academic Search

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

保存引用被引用次数：21 相关文章所有 4 个版本

[Free GPT-4]

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

保存引用被引用次数：35 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] acm.org

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

S Jayaram Subramanya, D Arfeen, S Lin… - Proceedings of the 29th …, 2023 - dl.acm.org

The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …

保存引用被引用次数：43 相关文章所有 5 个版本

[Free GPT-4]

[PDF] usenix.org

Power-aware Deep Learning Model Serving with {μ-Serve}

H Qiu, W Mao, A Patke, S Cui, S Jha, C Wang… - 2024 USENIX Annual …, 2024 - usenix.org

With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …

保存引用被引用次数：11 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] usenix.org

{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference

SS Shubha, H Shen, A Iyer - 18th USENIX Symposium on Operating …, 2024 - usenix.org

Minimizing monetary cost and maximizing the goodput of inference serving systems are
increasingly important with the ever-increasing popularity of deep learning models. While it …

保存引用被引用次数：3 相关文章 HTML 版

[Free GPT-4]

[PDF] usenix.org

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Q Weng, L Yang, Y Yu, W Wang, X Tang… - 2023 USENIX Annual …, 2023 - usenix.org

Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …

保存引用被引用次数：44 相关文章所有 10 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

B Li, T Patel, S Samsi, V Gadepally… - Proceedings of the 13th …, 2022 - dl.acm.org

GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …

保存引用被引用次数：57 相关文章所有 5 个版本

[Free GPT-4]

[PDF] acm.org

Toward sustainable hpc: Carbon footprint estimation and environmental implications of hpc systems

B Li, R Basu Roy, D Wang, S Samsi… - Proceedings of the …, 2023 - dl.acm.org

The rapid growth in demand for HPC systems has led to a rise in carbon footprint, which
requires urgent intervention. In this work, we present a comprehensive analysis of the …

保存引用被引用次数：38 相关文章所有 5 个版本

[Free GPT-4]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

保存引用被引用次数：7 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] yezhisheng.me

Chronus: A novel deadline-aware scheduler for deep learning training jobs

W Gao, Z Ye, P Sun, Y Wen, T Zhang - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org

Modern GPU clusters support Deep Learning training (DLT) jobs in a distributed manner.
Job scheduling is the key to improve the training performance, resource utilization and …

保存引用被引用次数：42 相关文章所有 4 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Deep learning workload scheduling in gpu datacenters: A survey

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Power-aware Deep Learning Model Serving with {μ-Serve}

{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

Toward sustainable hpc: Carbon footprint estimation and environmental implications of hpc systems

Efficient training of large language models on distributed infrastructures: a survey

Chronus: A novel deadline-aware scheduler for deep learning training jobs