Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Communication-efficient large-scale distributed deep learning: A comprehensive survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

Clover: Toward sustainable ai with carbon-aware machine learning inference service

B Li, S Samsi, V Gadepally, D Tiwari - Proceedings of the International …, 2023 - dl.acm.org
This paper presents a solution to the challenge of mitigating carbon emissions from hosting
large-scale machine learning (ML) inference services. ML inference is critical to modern …

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q **, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

Resource allocation and workload scheduling for large-scale distributed deep learning: A survey

F Liang, Z Zhang, H Lu, C Li, V Leung, Y Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
With rapidly increasing distributed deep learning workloads in large-scale data centers,
efficient distributed deep learning framework strategies for resource allocation and workload …

Inss: An intelligent scheduling orchestrator for multi-gpu inference with spatio-temporal sharing

Z Han, R Zhou, C Xu, Y Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
As the applications of AI proliferate, it is critical to increase the throughput of online DNN
inference services. Multi-process service (MPS) improves the utilization rate of GPU …

HarmonyBatch: Batching multi-SLO DNN inference with heterogeneous serverless functions

J Chen, F Xu, Y Gu, L Chen, F Liu… - 2024 IEEE/ACM 32nd …, 2024 - ieeexplore.ieee.org
Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to
its potential for substantial budget savings. Existing works on serverless DNN inference …

A stochastic approach for scheduling AI training jobs in GPU-based systems

F Filippini, J Anselmi, D Ardagna… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this work, we optimize the scheduling of Deep Learning (DL) training jobs from the
perspective of a Cloud Service Provider running a data center, which efficiently selects …

Reducing datacenter compute carbon footprint by harnessing the power of specialization: Principles, metrics, challenges and opportunities

T Eilam, P Bose, LP Carloni, A Cidon… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Computing is an indispensable tool in addressing climate change, but it also contributes to a
significant and steadily increasing carbon footprint, partly due to the exponential growth in …

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs

A Chen, F Xu, L Han, Y Dong, L Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
GPUs have become the defacto hardware devices for accelerating Deep Neural Network
(DNN) inference workloads. However, the conventional sequential execution mode of DNN …