Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling
The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …
Cascade speculative drafting for even faster llm inference
Introduced to enhance the efficiency of large language model (LLM) inference, speculative
decoding operates by having a smaller model generate a draft. A larger target model then …
decoding operates by having a smaller model generate a draft. A larger target model then …
Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators
With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …
increasingly rely on specialized hardware accelerators in addition to general-purpose …
Polca: Power oversubscription in llm cloud providers
Recent innovation in large language models (LLMs), and their myriad use-cases have
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …
Blox: A Modular Toolkit for Deep Learning Schedulers
Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters
and several new cluster schedulers have been proposed in recent years to support these …
and several new cluster schedulers have been proposed in recent years to support these …
Characterizing Power Management Opportunities for LLMs in the Cloud
Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …
SuperBench: improving cloud AI infrastructure reliability with proactive validation
Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …
widespread use of hardware redundancies. However, these redundancies can inadvertently …
{SuperBench}: Improving Cloud {AI} Infrastructure Reliability with Proactive Validation
Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …
widespread use of hardware redundancies. However, these redundancies can inadvertently …
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
Large-scale computing systems are increasingly using accelerators such as GPUs to enable
peta-and exa-scale levels of compute to meet the needs of Machine Learning (ML) and …
peta-and exa-scale levels of compute to meet the needs of Machine Learning (ML) and …
PerfTop: Towards Performance Prediction of Distributed Learning over General Topology
Distributed learning with multiple GPUs has been widely adopted to accelerate the training
process of large-scale deep neural networks. However, misconfiguration of the GPU clusters …
process of large-scale deep neural networks. However, misconfiguration of the GPU clusters …