Data storage management in cloud environments: Taxonomy, survey, and future directions
Storage as a Service (StaaS) is a vital component of cloud computing by offering the vision
of a virtually infinite pool of storage resources. It supports a variety of cloud-based data store …
of a virtually infinite pool of storage resources. It supports a variety of cloud-based data store …
Speeding up distributed machine learning using codes
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …
In large-scale systems, there are several types of noise that can affect the performance of …
Short-dot: Computing large linear transforms distributedly using coded short dot products
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …
designers have increasingly resorted to parallel and distributed computing to reduce …
{EC-Cache}:{Load-Balanced},{Low-Latency} Cluster Caching with Online Erasure Coding
Data-intensive clusters and object stores are increasingly relying on in-memory object
caching to meet the I/O performance demands. These systems routinely face the challenges …
caching to meet the I/O performance demands. These systems routinely face the challenges …
When do redundant requests reduce latency?
Many systems possess the flexibility to serve requests in more than one way, such as
distributed storage systems that store multiple copies of the data. In such systems, the …
distributed storage systems that store multiple copies of the data. In such systems, the …
Reducing latency via redundant requests: Exact analysis
Recent computer systems research has proposed using redundant requests to reduce
latency. The idea is to run a request on multiple servers and wait for the first completion …
latency. The idea is to run a request on multiple servers and wait for the first completion …
On the delay-storage trade-off in content download from coded distributed storage systems
We study how coding in distributed storage reduces expected download time, in addition to
providing reliability against disk failures. The expected download time is reduced because …
providing reliability against disk failures. The expected download time is reduced because …
Efficient redundancy techniques for latency reduction in cloud systems
In cloud computing systems, assigning a task to multiple servers and waiting for the earliest
copy to finish is an effective method to combat the variability in response time of individual …
copy to finish is an effective method to combat the variability in response time of individual …
Open problems in queueing theory inspired by datacenter computing
M Harchol-Balter - Queueing Systems, 2021 - Springer
Datacenter operations today provide a plethora of new queueing and scheduling problems.
The notion of a “job” has become more general and multi-dimensional. The ways in which …
The notion of a “job” has become more general and multi-dimensional. The ways in which …
Stochastic gradient coding for straggler mitigation in distributed learning
We consider distributed gradient descent in the presence of stragglers. Recent work on
gradient coding and approximate gradient coding have shown how to add redundancy in …
gradient coding and approximate gradient coding have shown how to add redundancy in …