Fairness in serving large language models

Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo… - … USENIX Symposium on …, 2024 - usenix.org
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …

Learning scheduling algorithms for data processing clusters

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019 - dl.acm.org
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

ByteGNN: efficient graph neural network training at large scale

C Zheng, H Chen, Y Cheng, Z Song, Y Wu… - Proceedings of the …, 2022 - dl.acm.org
Graph neural networks (GNNs) have shown excellent performance in a wide range of
applications such as recommendation, risk control, and drug discovery. With the increase in …

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org
Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org
Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X ** - Proceedings of the ACM …, 2022 - dl.acm.org
Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

Elastic resource sharing for distributed deep learning

C Hwang, T Kim, S Kim, J Shin, KS Park - 18th USENIX Symposium on …, 2021 - usenix.org
Resource allocation and scheduling strategies for deep learning training (DLT) jobs have a
critical impact on their average job completion time (JCT). Unfortunately, traditional …