Datacenter traffic control: Understanding techniques and tradeoffs
M Noormohammadpour… - … Surveys & Tutorials, 2017 - ieeexplore.ieee.org
Datacenters provide cost-effective and flexible access to scalable compute and storage
resources necessary for today's cloud computing needs. A typical datacenter is made up of …
resources necessary for today's cloud computing needs. A typical datacenter is made up of …
Optimus: an efficient dynamic resource scheduler for deep learning clusters
Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …
in both the research community and industry. When operating a datacenter, optimization of …
Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …
Morpheus: Towards automated {SLOs} for enterprise clusters
Modern resource management frameworks for largescale analytics leave unresolved the
problematic tension between high cluster utilization and job's performance predictability …
problematic tension between high cluster utilization and job's performance predictability …
Online deadline-aware task dispatching and scheduling in edge computing
In this article, we study online deadline-aware task dispatching and scheduling in edge
computing. We jointly considerthe management of the networking and computing resources …
computing. We jointly considerthe management of the networking and computing resources …
CODA: Toward automatically identifying and scheduling coflows in the dark
Leveraging application-level requirements using coflows has recently been shown to
improve application-level communication performance in data-parallel clusters. However …
improve application-level communication performance in data-parallel clusters. However …
Carbonscaler: Leveraging cloud workload elasticity for optimizing carbon-efficiency
Cloud platforms are increasing their emphasis on sustainability and reducing their
operational carbon footprint. A common approach for reducing carbon emissions is to exploit …
operational carbon footprint. A common approach for reducing carbon emissions is to exploit …
Network-aware locality scheduling for distributed data operators in data centers
Large data centers are currently the mainstream infrastructures for big data processing. As
one of the most fundamental tasks in these environments, the efficient execution of …
one of the most fundamental tasks in these environments, the efficient execution of …
Repair pipelining for erasure-coded storage: Algorithms and evaluation
We propose repair pipelining, a technique that speeds up the repair performance in general
erasure-coded storage. By carefully scheduling the repair of failed data in small-size units …
erasure-coded storage. By carefully scheduling the repair of failed data in small-size units …