A survey on scheduling techniques in computing and network convergence
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …
computing power. This trend results in the urgent need for higher-level computing resource …
{FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices
User-facing latency-sensitive web services include numerous distributed,
intercommunicating microservices that promise to simplify software development and …
intercommunicating microservices that promise to simplify software development and …
Tiresias: A {GPU} cluster manager for distributed deep learning
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …
managers, such as unpredictable training times, an all-or-nothing execution model, and …
Optimus: an efficient dynamic resource scheduler for deep learning clusters
Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …
Live video analytics at scale with approximation and {Delay-Tolerance}
Video cameras are pervasively deployed for security and smart city scenarios, with millions
of them in large cities worldwide. Achieving the potential of these cameras requires …
of them in large cities worldwide. Achieving the potential of these cameras requires …
Faster and cheaper serverless computing on harvested resources
Serverless computing is becoming increasingly popular due to its ease of programming, fast
elasticity, and fine-grained billing. However, the serverless provider still needs to provision …
elasticity, and fine-grained billing. However, the serverless provider still needs to provision …
Protean:{VM} allocation service at scale
We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …
in both the research community and industry. When operating a datacenter, optimization of …
Learning to rotate: Quaternion transformer for complicated periodical time series forecasting
Time series forecasting is a critical and challenging problem in many real applications.
Recently, Transformer-based models prevail in time series forecasting due to their …
Recently, Transformer-based models prevail in time series forecasting due to their …
InferLine: latency-aware provisioning and scaling for prediction serving pipelines
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a
key challenge in production machine learning. Optimally configuring these pipelines to meet …
key challenge in production machine learning. Optimally configuring these pipelines to meet …