- Academic Search

Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo… - … USENIX Symposium on …, 2024 - usenix.org

High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …

保存引用被引用数: 39 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] acm.org

Learning scheduling algorithms for data processing clusters

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019 - dl.acm.org

Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

保存引用被引用数: 821 関連記事全 13 バージョン

[Free GPT-4]

[PDF] usenix.org

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org

Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

保存引用被引用数: 447 関連記事全 13 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

保存引用被引用数: 137 関連記事全 6 バージョン

[Free GPT-4]

[PDF] ird.fr

ByteGNN: efficient graph neural network training at large scale

C Zheng, H Chen, Y Cheng, Z Song, Y Wu… - Proceedings of the …, 2022 - dl.acm.org

Graph neural networks (GNNs) have shown excellent performance in a wide range of
applications such as recommendation, risk control, and drug discovery. With the increase in …

保存引用被引用数: 82 関連記事全 6 バージョン

[Free GPT-4]

[PDF] academia.edu

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org

Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

保存引用被引用数: 76 関連記事全 5 バージョン

[Free GPT-4]

[PDF] usenix.org

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org

Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …

保存引用被引用数: 250 関連記事全 17 バージョン HTMLバージョン

[Free GPT-4]

[PDF] usenix.org

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org

Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

保存引用被引用数: 75 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] yibozhu.com

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X ** - Proceedings of the ACM …, 2022 - dl.acm.org

Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

保存引用被引用数: 61 関連記事全 4 バージョン

[Free GPT-4]

[PDF] usenix.org

Elastic resource sharing for distributed deep learning

C Hwang, T Kim, S Kim, J Shin, KS Park - 18th USENIX Symposium on …, 2021 - usenix.org

Resource allocation and scheduling strategies for deep learning training (DLT) jobs have a
critical impact on their average job completion time (JCT). Unfortunately, traditional …

保存引用被引用数: 85 関連記事全 9 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Altruistic scheduling in {Multi-Resource} clusters

Fairness in serving large language models

Learning scheduling algorithms for data processing clusters

Tiresias: A {GPU} cluster manager for distributed deep learning

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

ByteGNN: efficient graph neural network training at large scale

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

Themis: Fair and efficient {GPU} cluster scheduling

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

Multi-resource interleaving for deep learning training

Elastic resource sharing for distributed deep learning