Data storage management in cloud environments: Taxonomy, survey, and future directions

Y Mansouri, AN Toosi, R Buyya - ACM Computing Surveys (CSUR), 2017 - dl.acm.org
Storage as a Service (StaaS) is a vital component of cloud computing by offering the vision
of a virtually infinite pool of storage resources. It supports a variety of cloud-based data store …

Speeding up distributed machine learning using codes

K Lee, M Lam, R Pedarsani… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …

Short-dot: Computing large linear transforms distributedly using coded short dot products

S Dutta, V Cadambe, P Grover - Advances In Neural …, 2016 - proceedings.neurips.cc
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …

{EC-Cache}:{Load-Balanced},{Low-Latency} Cluster Caching with Online Erasure Coding

KV Rashmi, M Chowdhury, J Kosaian, I Stoica… - … USENIX Symposium on …, 2016 - usenix.org
Data-intensive clusters and object stores are increasingly relying on in-memory object
caching to meet the I/O performance demands. These systems routinely face the challenges …

When do redundant requests reduce latency?

NB Shah, K Lee, K Ramchandran - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Many systems possess the flexibility to serve requests in more than one way, such as
distributed storage systems that store multiple copies of the data. In such systems, the …

Reducing latency via redundant requests: Exact analysis

K Gardner, S Zbarsky, S Doroudi… - ACM SIGMETRICS …, 2015 - dl.acm.org
Recent computer systems research has proposed using redundant requests to reduce
latency. The idea is to run a request on multiple servers and wait for the first completion …

On the delay-storage trade-off in content download from coded distributed storage systems

G Joshi, Y Liu, E Soljanin - IEEE Journal on Selected Areas in …, 2014 - ieeexplore.ieee.org
We study how coding in distributed storage reduces expected download time, in addition to
providing reliability against disk failures. The expected download time is reduced because …

Efficient redundancy techniques for latency reduction in cloud systems

G Joshi, E Soljanin, G Wornell - ACM Transactions on Modeling and …, 2017 - dl.acm.org
In cloud computing systems, assigning a task to multiple servers and waiting for the earliest
copy to finish is an effective method to combat the variability in response time of individual …

Open problems in queueing theory inspired by datacenter computing

M Harchol-Balter - Queueing Systems, 2021 - Springer
Datacenter operations today provide a plethora of new queueing and scheduling problems.
The notion of a “job” has become more general and multi-dimensional. The ways in which …

Stochastic gradient coding for straggler mitigation in distributed learning

R Bitar, M Wootters… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org
We consider distributed gradient descent in the presence of stragglers. Recent work on
gradient coding and approximate gradient coding have shown how to add redundancy in …