Cluster resource scheduling in cloud computing: literature review and research challenges

W Khallouli, J Huang - The Journal of supercomputing, 2022 - Springer
Scheduling plays a pivotal role in cloud computing systems. Designing an efficient
scheduler is a challenging task. The challenge comes from several aspects, including the …

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Y Peng, Y Bao, Y Chen, C Wu, C Guo - Proceedings of the Thirteenth …, 2018 - dl.acm.org
Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …

Protean:{VM} allocation service at scale

O Hadary, L Marshall, I Menache, A Pan… - … USENIX Symposium on …, 2020 - usenix.org
We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …

Firmament: Fast, centralized cluster scheduling at scale

I Gog, M Schwarzkopf, A Gleave, RNM Watson… - … USENIX Symposium on …, 2016 - usenix.org
Centralized datacenter schedulers can make high-quality placement decisions when
scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of …

Hermod: principled and practical scheduling for serverless functions

K Kaffes, NJ Yadwadkar, C Kozyrakis - … of the 13th Symposium on Cloud …, 2022 - dl.acm.org
Serverless computing has seen rapid growth due to the ease-of-use and cost-efficiency it
provides. However, function scheduling, a critical component of serverless systems, has …

Machine learning for computer systems and networking: A survey

ME Kanakis, R Khalili, L Wang - ACM Computing Surveys, 2022 - dl.acm.org
Machine learning (ML) has become the de-facto approach for various scientific domains
such as computer vision and natural language processing. Despite recent breakthroughs …

Fifer: Tackling resource underutilization in the serverless era

JR Gunasekaran, P Thinakaran… - Proceedings of the 21st …, 2020 - dl.acm.org
Datacenters are witnessing a rapid surge in the adoption of serverless functions for
microservices-based applications. A vast majority of these microservices typically span less …

On the diversity of cluster workloads and its impact on research results

G Amvrosiadis, JW Park, GR Ganger… - 2018 USENIX Annual …, 2018 - usenix.org
Six years ago, Google released an invaluable set of scheduler logs which has already been
used in more than 450 publications. We find that the scarcity of other data sources, however …

Resource scheduling methods for cloud computing environment: The role of meta-heuristics and artificial intelligence

R Aron, A Abraham - Engineering Applications of Artificial Intelligence, 2022 - Elsevier
The growth and development of scientific applications have demanded the creation of
efficient resource management systems. Resource provisioning and scheduling are two …

{RobinHood}: Tail Latency Aware Caching--Dynamic Reallocation from {Cache-Rich} to {Cache-Poor}

DS Berger, B Berg, T Zhu, S Sen… - 13th USENIX Symposium …, 2018 - usenix.org
Tail latency is of great importance in user-facing web services. However, maintaining low tail
latency is challenging, because a single request to a web application server results in …