Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

S Jayaram Subramanya, D Arfeen, S Lin… - Proceedings of the 29th …, 2023 - dl.acm.org
The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …

Cascade speculative drafting for even faster llm inference

Z Chen, X Yang, J Lin, C Sun, KCC Chang… - arxiv preprint arxiv …, 2023 - arxiv.org
Introduced to enhance the efficiency of large language model (LLM) inference, speculative
decoding operates by having a smaller model generate a draft. A larger target model then …

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

T Pfandzelter, A Dhakal, E Frachtenberg… - Proceedings of the 24th …, 2023 - dl.acm.org
With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …

Polca: Power oversubscription in llm cloud providers

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent innovation in large language models (LLMs), and their myriad use-cases have
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …

Blox: A Modular Toolkit for Deep Learning Schedulers

S Agarwal, A Phanishayee… - Proceedings of the …, 2024 - dl.acm.org
Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters
and several new cluster schedulers have been proposed in recent years to support these …

Characterizing Power Management Opportunities for LLMs in the Cloud

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - Proceedings of the 29th …, 2024 - dl.acm.org
Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …

SuperBench: improving cloud AI infrastructure reliability with proactive validation

Y **ong, Y Jiang, Z Yang, L Qu, G Zhao, S Liu… - Proceedings of the …, 2024 - dl.acm.org
Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …

{SuperBench}: Improving Cloud {AI} Infrastructure Reliability with Proactive Validation

Y **ong, Y Jiang, Z Yang, L Qu, G Zhao, S Liu… - 2024 USENIX Annual …, 2024 - usenix.org
Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

R Jain, B Tran, K Chen, MD Sinclair… - … Conference for High …, 2024 - ieeexplore.ieee.org
Large-scale computing systems are increasingly using accelerators such as GPUs to enable
peta-and exa-scale levels of compute to meet the needs of Machine Learning (ML) and …

PerfTop: Towards Performance Prediction of Distributed Learning over General Topology

C Yan, Z Zhu, Y Niu, C Wang, C Zhuo, J Xu - Journal of Parallel and …, 2024 - Elsevier
Distributed learning with multiple GPUs has been widely adopted to accelerate the training
process of large-scale deep neural networks. However, misconfiguration of the GPU clusters …