Resource management in clouds: Survey and research challenges

B Jennings, R Stadler - Journal of Network and Systems Management, 2015 - Springer
Resource management in a cloud environment is a hard problem, due to: the scale of
modern data centers; the heterogeneity of resource types and their interdependencies; the …

A survey and classification of software-defined storage systems

R Macedo, J Paulo, J Pereira, A Bessani - ACM Computing Surveys …, 2020 - dl.acm.org
The exponential growth of digital information is imposing increasing scale and efficiency
demands on modern storage infrastructures. As infrastructure complexity increases, so does …

Reflex: Remote flash≈ local flash

A Klimovic, H Litz, C Kozyrakis - ACM SIGARCH Computer Architecture …, 2017 - dl.acm.org
Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity
and IOPS within a datacenter. However, existing systems for remote Flash access either …

Ioflow: A software-defined storage architecture

E Thereska, H Ballani, G O'Shea… - Proceedings of the …, 2013 - dl.acm.org
In data centers, the IO path to storage is long and complex. It comprises many layers or"
stages" with opaque interfaces between them. This makes it hard to enforce end-to-end …

Flash storage disaggregation

A Klimovic, C Kozyrakis, E Thereska, B John… - Proceedings of the …, 2016 - dl.acm.org
PCIe-based Flash is commonly deployed to provide datacenter applications with high IO
rates. However, its capacity and bandwidth are often underutilized as it is difficult to design …

What bugs live in the cloud? a study of 3000+ issues in cloud systems

HS Gunawi, M Hao, T Leesatapornwongsa… - Proceedings of the …, 2014 - dl.acm.org
We conduct a comprehensive study of development and deployment issues of six popular
and important cloud systems (Hadoop MapReduce, HDFS, HBase, Cassandra, ZooKeeper …

Retro: Targeted resource management in multi-tenant distributed systems

J Mace, P Bodik, R Fonseca, M Musuvathi - 12th USENIX Symposium …, 2015 - usenix.org
In distributed systems shared by multiple tenants, effective resource management is an
important pre-requisite to providing quality of service guarantees. Many systems deployed …

Aequitas: Admission control for performance-critical rpcs in datacenters

Y Zhang, G Kumar, N Dukkipati, X Wu, P Jha… - Proceedings of the …, 2022 - dl.acm.org
With the increasing popularity of disaggregated storage and microservice architectures, high
fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in …

Prioritymeister: Tail latency qos for shared networked storage

T Zhu, A Tumanov, MA Kozuch… - Proceedings of the …, 2014 - dl.acm.org
Meeting service level objectives (SLOs) for tail latency is an important and challenging open
problem in cloud computing infrastructures. The challenges are exacerbated by burstiness …

Nova-LSM: a distributed, component-based LSM-tree key-value store

H Huang, S Ghandeharizadeh - … of the 2021 International Conference on …, 2021 - dl.acm.org
The cloud infrastructure motivates disaggregation of monolithic data stores into components
that are assembled together based on an application's workload. This study investigates …