Google Akademik

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Kaydet Alıntı yap Alıntılanma sayısı: 23 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] usenix.org

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W **ao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org

With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

Kaydet Alıntı yap Alıntılanma sayısı: 299 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Kaydet Alıntı yap Alıntılanma sayısı: 35 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] usenix.org

Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning

A Qiao, SK Choe, SJ Subramanya… - … on Operating Systems …, 2021 - usenix.org

Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-
optimizing inter-dependent factors both at the per-job level and at the cluster-wide level …

Kaydet Alıntı yap Alıntılanma sayısı: 207 İlgili makaleler 15 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

Kaydet Alıntı yap Alıntılanma sayısı: 138 İlgili makaleler 6 sürümün hepsi

[Free GPT-4]

[PDF] usenix.org

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org

Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

Kaydet Alıntı yap Alıntılanma sayısı: 77 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] acm.org

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

S Jayaram Subramanya, D Arfeen, S Lin… - Proceedings of the 29th …, 2023 - dl.acm.org

The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …

Kaydet Alıntı yap Alıntılanma sayısı: 43 İlgili makaleler 5 sürümün hepsi

[Free GPT-4]

[PDF] github.io

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale

A Choudhury, Y Wang, T Pelkonen… - 18th USENIX …, 2024 - yangwang83.github.io

In public clouds, users must manually select a datacenter region to upload their ML training
data and launch ML training workloads in the same region to ensure data and computation …

Kaydet Alıntı yap Alıntılanma sayısı: 11 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] ieee.org

Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters

R Gu, Y Chen, S Liu, H Dai, G Chen… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Deep learning (DL) is becoming increasingly popular in many domains, including computer
vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently …

Kaydet Alıntı yap Alıntılanma sayısı: 71 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] yibozhu.com

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X ** - Proceedings of the ACM …, 2022 - dl.acm.org

Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

Kaydet Alıntı yap Alıntılanma sayısı: 61 İlgili makaleler 4 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

Deep learning workload scheduling in gpu datacenters: A survey

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale

Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters

Multi-resource interleaving for deep learning training