Datacenter traffic control: Understanding techniques and tradeoffs

M Noormohammadpour… - … Surveys & Tutorials, 2017 - ieeexplore.ieee.org
Datacenters provide cost-effective and flexible access to scalable compute and storage
resources necessary for today's cloud computing needs. A typical datacenter is made up of …

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Y Peng, Y Bao, Y Chen, C Wu, C Guo - Proceedings of the Thirteenth …, 2018 - dl.acm.org
Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

Morpheus: Towards automated {SLOs} for enterprise clusters

SA Jyothi, C Curino, I Menache… - … USENIX symposium on …, 2016 - usenix.org
Modern resource management frameworks for largescale analytics leave unresolved the
problematic tension between high cluster utilization and job's performance predictability …

Online deadline-aware task dispatching and scheduling in edge computing

J Meng, H Tan, XY Li, Z Han, B Li - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
In this article, we study online deadline-aware task dispatching and scheduling in edge
computing. We jointly considerthe management of the networking and computing resources …

CODA: Toward automatically identifying and scheduling coflows in the dark

H Zhang, L Chen, B Yi, K Chen, M Chowdhury… - Proceedings of the …, 2016 - dl.acm.org
Leveraging application-level requirements using coflows has recently been shown to
improve application-level communication performance in data-parallel clusters. However …

Carbonscaler: Leveraging cloud workload elasticity for optimizing carbon-efficiency

WA Hanafy, Q Liang, N Bashir, D Irwin… - Proceedings of the ACM …, 2023 - dl.acm.org
Cloud platforms are increasing their emphasis on sustainability and reducing their
operational carbon footprint. A common approach for reducing carbon emissions is to exploit …

Network-aware locality scheduling for distributed data operators in data centers

L Cheng, Y Wang, Q Liu, DHJ Epema… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Large data centers are currently the mainstream infrastructures for big data processing. As
one of the most fundamental tasks in these environments, the efficient execution of …

Repair pipelining for erasure-coded storage: Algorithms and evaluation

X Li, Z Yang, J Li, R Li, PPC Lee, Q Huang… - ACM Transactions on …, 2021 - dl.acm.org
We propose repair pipelining, a technique that speeds up the repair performance in general
erasure-coded storage. By carefully scheduling the repair of failed data in small-size units …