- Academic Search

A Verma, L Pedrosa, M Korupolu… - Proceedings of the …, 2015 - dl.acm.org

Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from
many thousands of different applications, across a number of clusters each with up to tens of …

Save Cite Cited by 1917 Related articles All 59 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Heracles: Improving resource efficiency at scale

D Lo, L Cheng, R Govindaraju… - Proceedings of the …, 2015 - dl.acm.org

User-facing, latency-sensitive services, such as websearch, underutilize their computing
resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …

Save Cite Cited by 696 Related articles All 21 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

Save Cite Cited by 137 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

Morpheus: Towards automated {SLOs} for enterprise clusters

SA Jyothi, C Curino, I Menache… - … USENIX symposium on …, 2016 - usenix.org

Modern resource management frameworks for largescale analytics leave unresolved the
problematic tension between high cluster utilization and job's performance predictability …

Save Cite Cited by 342 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] archive.org

Multi-tenant cloud data services: state-of-the-art, challenges and opportunities

V Narasayya, S Chaudhuri - … of the 2022 International Conference on …, 2022 - dl.acm.org

Enterprises are moving their business-critical workloads to public clouds at an accelerating
pace. Multi-tenancy is a crucial tenet for cloud data service providers allowing them to …

Save Cite Cited by 25 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{GRAPHENE}: Packing and {Dependency-Aware} scheduling for {Data-Parallel} clusters

R Grandl, S Kandula, S Rao, A Akella… - 12th USENIX Symposium …, 2016 - usenix.org

We present a new cluster scheduler, GRAPHENE, aimed at jobs that have a complex
dependency structure and heterogeneous resource demands. Relaxing either of these …

Save Cite Cited by 281 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] academia.edu

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters

A Tumanov, T Zhu, JW Park, MA Kozuch… - Proceedings of the …, 2016 - dl.acm.org

TetriSched is a scheduler that works in tandem with a calendaring reservation system to
continuously re-evaluate the immediate-term scheduling plan for all pending jobs (including …

Save Cite Cited by 252 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Slaq: quality-driven scheduling for distributed machine learning

H Zhang, L Stafman, A Or, MJ Freedman - Proceedings of the 2017 …, 2017 - dl.acm.org

Training machine learning (ML) models with large datasets can incur significant resource
contention on shared clusters. This training typically involves many iterations that continually …

Save Cite Cited by 194 Related articles All 17 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

Mercury: Hybrid centralized and distributed scheduling in large shared clusters

K Karanasos, S Rao, C Curino, C Douglas… - 2015 USENIX Annual …, 2015 - usenix.org

Datacenter-scale computing for analytics workloads is increasingly common. High
operational costs force heterogeneous applications to share cluster resources for achieving …

Save Cite Cited by 249 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acmsocc.org

The elasticity and plasticity in semi-containerized co-locating cloud workload: A view from alibaba trace

Q Liu, Z Yu - Proceedings of the ACM Symposium on Cloud …, 2018 - dl.acm.org

Cloud computing with large-scale datacenters provides great convenience and cost-
efficiency for end users. However, the resource utilization of cloud datacenters is very low …

Save Cite Cited by 140 Related articles All 3 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Reservation-based scheduling: If you're late don't blame us!

Large-scale cluster management at Google with Borg

Heracles: Improving resource efficiency at scale

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Morpheus: Towards automated {SLOs} for enterprise clusters

Multi-tenant cloud data services: state-of-the-art, challenges and opportunities

{GRAPHENE}: Packing and {Dependency-Aware} scheduling for {Data-Parallel} clusters

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters

Slaq: quality-driven scheduling for distributed machine learning

Mercury: Hybrid centralized and distributed scheduling in large shared clusters

The elasticity and plasticity in semi-containerized co-locating cloud workload: A view from alibaba trace