A survey and classification of software-defined storage systems

R Macedo, J Paulo, J Pereira, A Bessani - ACM Computing Surveys …, 2020 - dl.acm.org
The exponential growth of digital information is imposing increasing scale and efficiency
demands on modern storage infrastructures. As infrastructure complexity increases, so does …

Reflex: Remote flash≈ local flash

A Klimovic, H Litz, C Kozyrakis - ACM SIGARCH Computer Architecture …, 2017 - dl.acm.org
Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity
and IOPS within a datacenter. However, existing systems for remote Flash access either …

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org
Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

Trumpet: Timely and precise triggers in data centers

M Moshref, M Yu, R Govindan, A Vahdat - Proceedings of the 2016 ACM …, 2016 - dl.acm.org
As data centers grow larger and strive to provide tight performance and availability SLAs,
their monitoring infrastructure must move from passive systems that provide aggregated …

{ResQ}: Enabling {SLOs} in network function virtualization

A Tootoonchian, A Panda, C Lan, M Walls… - … USENIX Symposium on …, 2018 - usenix.org
Network Function Virtualization is allowing carriers to replace dedicated middleboxes with
Network Functions (NFs) consolidated on shared servers, but the question of how (and even …

Flash storage disaggregation

A Klimovic, C Kozyrakis, E Thereska, B John… - Proceedings of the …, 2016 - dl.acm.org
PCIe-based Flash is commonly deployed to provide datacenter applications with high IO
rates. However, its capacity and bandwidth are often underutilized as it is difficult to design …

{HUG}:{Multi-Resource} fairness for correlated and elastic demands

M Chowdhury, Z Liu, A Ghodsi, I Stoica - 13th USENIX symposium on …, 2016 - usenix.org
In this paper, we study how to optimally provide isolation guarantees in multi-resource
environments, such as public clouds, where a tenant's demands on different resources …

Understanding {RDMA} microarchitecture resources for performance isolation

X Kong, J Chen, W Bai, Y Xu, M Elhaddad… - … USENIX Symposium on …, 2023 - usenix.org
Recent years have witnessed the wide adoption of RDMA in the cloud to accelerate first-
party workloads and achieve cost savings by freeing up CPU cycles. Now cloud providers …

Karma: Resource allocation for dynamic demands

M Vuppalapati, G Fikioris, R Agarwal, A Cidon… - … USENIX Symposium on …, 2023 - usenix.org
The classical max-min fairness algorithm for resource allocation provides many desirable
properties, eg, Pareto efficiency, strategy-proofness and fairness. This paper builds upon the …

Picnic: predictable virtualized nic

P Kumar, N Dukkipati, N Lewis, Y Cui, Y Wang… - Proceedings of the …, 2019 - dl.acm.org
Network virtualization stacks are the linchpins of public clouds. A key goal is to provide
performance isolation so that workloads on one Virtual Machine (VM) do not adversely …