Fairness in serving large language models
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …
requests from short chat conversations to long document reading. To ensure that all client …
Learning scheduling algorithms for data processing clusters
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …
algorithms. Current systems use simple, generalized heuristics and ignore workload …
Tiresias: A {GPU} cluster manager for distributed deep learning
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …
managers, such as unpredictable training times, an all-or-nothing execution model, and …
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …
in both the research community and industry. When operating a datacenter, optimization of …
ByteGNN: efficient graph neural network training at large scale
Graph neural networks (GNNs) have shown excellent performance in a wide range of
applications such as recommendation, risk control, and drug discovery. With the increase in …
applications such as recommendation, risk control, and drug discovery. With the increase in …
Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
Themis: Fair and efficient {GPU} cluster scheduling
Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …
leveraging GPUs. However, significant contention ensues when multiple such workloads are …
Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters
Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …
Multi-resource interleaving for deep learning training
Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …
Elastic resource sharing for distributed deep learning
Resource allocation and scheduling strategies for deep learning training (DLT) jobs have a
critical impact on their average job completion time (JCT). Unfortunately, traditional …
critical impact on their average job completion time (JCT). Unfortunately, traditional …