- Academic Search

F Lai, X Zhu, HV Madhyastha… - 15th {USENIX} Symposium …, 2021 - usenix.org

Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that
enables in-situ model training and testing on edge data. Despite having the same end goals …

Save Cite Cited by 474 Related articles All 16 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] usenix.org

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W **ao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org

With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

Save Cite Cited by 296 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Save Cite Cited by 21 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org

Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

Save Cite Cited by 346 Related articles All 10 versions Free GPT-4 View as HTML

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Save Cite Cited by 11 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{INFaaS}: Automated model-less inference serving

F Romero, Q Li, NJ Yadwadkar… - 2021 USENIX Annual …, 2021 - usenix.org

Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

Save Cite Cited by 236 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] usenix.org

{Heterogeneity-Aware} cluster scheduling policies for deep learning workloads

D Narayanan, K Santhanam, F Kazhamiaka… - … USENIX Symposium on …, 2020 - usenix.org

Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been
increasingly deployed to train deep learning models. These accelerators exhibit …

Save Cite Cited by 257 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …

Save Cite Cited by 79 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] usenix.org

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

Save Cite Cited by 433 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] yibozhu.com

A generic communication scheduler for distributed DNN training acceleration

Y Peng, Y Zhu, Y Chen, Y Bao, B Yi, C Lan… - Proceedings of the 27th …, 2019 - dl.acm.org

We present ByteScheduler, a generic communication scheduler for distributed DNN training
acceleration. ByteScheduler is based on our principled analysis that partitioning and …

Save Cite Cited by 387 Related articles All 7 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Tiresias: A {GPU} cluster manager for distributed deep learning

Oort: Efficient federated learning via guided participant selection

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Deep learning workload scheduling in gpu datacenters: A survey

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

A survey on scheduling techniques in computing and network convergence

{INFaaS}: Automated model-less inference serving

{Heterogeneity-Aware} cluster scheduling policies for deep learning workloads

Fast distributed inference serving for large language models

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

A generic communication scheduler for distributed DNN training acceleration