- Academic Search

S Jayaram Subramanya, D Arfeen, S Lin… - Proceedings of the 29th …, 2023 - dl.acm.org

The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …

Spara Citera Citerat av 43 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]

[PDF] arxiv.org

Cascade speculative drafting for even faster llm inference

Z Chen, X Yang, J Lin, C Sun, KCC Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Introduced to enhance the efficiency of large language model (LLM) inference, speculative
decoding operates by having a smaller model generate a draft. A larger target model then …

Spara Citera Citerat av 31 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] frachtenberg.com

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

T Pfandzelter, A Dhakal, E Frachtenberg… - Proceedings of the 24th …, 2023 - dl.acm.org

With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …

Spara Citera Citerat av 4 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] arxiv.org

Polca: Power oversubscription in llm cloud providers

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent innovation in large language models (LLMs), and their myriad use-cases have
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …

Spara Citera Citerat av 10 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] acm.org

Blox: A Modular Toolkit for Deep Learning Schedulers

S Agarwal, A Phanishayee… - Proceedings of the …, 2024 - dl.acm.org

Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters
and several new cluster schedulers have been proposed in recent years to support these …

Spara Citera Citerat av 6 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] acm.org

Characterizing Power Management Opportunities for LLMs in the Cloud

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - Proceedings of the 29th …, 2024 - dl.acm.org

Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …

Spara Citera Citerat av 30 Relaterade artiklar Alla 4 versionerna

SuperBench: improving cloud AI infrastructure reliability with proactive validation

Y **ong, Y Jiang, Z Yang, L Qu, G Zhao, S Liu… - Proceedings of the …, 2024 - dl.acm.org

Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …

Spara Citera Citerat av 2 Relaterade artiklar

[Free GPT-4]

[PDF] usenix.org

{SuperBench}: Improving Cloud {AI} Infrastructure Reliability with Proactive Validation

Y **ong, Y Jiang, Z Yang, L Qu, G Zhao, S Liu… - 2024 USENIX Annual …, 2024 - usenix.org

Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the
widespread use of hardware redundancies. However, these redundancies can inadvertently …

Spara Citera Relaterade artiklar Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

R Jain, B Tran, K Chen, MD Sinclair… - … Conference for High …, 2024 - ieeexplore.ieee.org

Large-scale computing systems are increasingly using accelerators such as GPUs to enable
peta-and exa-scale levels of compute to meet the needs of Machine Learning (ML) and …

Spara Citera Relaterade artiklar Alla 8 versionerna

PerfTop: Towards Performance Prediction of Distributed Learning over General Topology

C Yan, Z Zhu, Y Niu, C Wang, C Zhuo, J Xu - Journal of Parallel and …, 2024 - Elsevier

Distributed learning with multiple GPUs has been widely adopted to accelerate the training
process of large-scale deep neural networks. However, misconfiguration of the GPU clusters …

Spara Citera Relaterade artiklar

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Not all gpus are created equal: characterizing variability in large-scale, accelerator-rich systems

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Cascade speculative drafting for even faster llm inference

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

Polca: Power oversubscription in llm cloud providers

Blox: A Modular Toolkit for Deep Learning Schedulers

Characterizing Power Management Opportunities for LLMs in the Cloud

SuperBench: improving cloud AI infrastructure reliability with proactive validation

{SuperBench}: Improving Cloud {AI} Infrastructure Reliability with Proactive Validation

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

PerfTop: Towards Performance Prediction of Distributed Learning over General Topology