A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …
due to benefits such as high reliability, scalability, computation speed, and cost …
Distributed data management using MapReduce
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …
distributed cluster, which has been used for applications such as generating search indexes …
Ray: A distributed framework for emerging {AI} applications
The next generation of AI applications will continuously interact with the environment and
learn from these interactions. These applications impose new and demanding systems …
learn from these interactions. These applications impose new and demanding systems …
Resource management with deep reinforcement learning
Resource management problems in systems and networking often manifest as difficult
online decision making tasks where appropriate solutions depend on understanding the …
online decision making tasks where appropriate solutions depend on understanding the …
Speeding up distributed machine learning using codes
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …
In large-scale systems, there are several types of noise that can affect the performance of …
Shuffling, fast and slow: Scalable analytics on serverless infrastructure
Serverless computing is poised to fulfill the long-held promise of transparent elasticity and
millisecond-level pricing. To achieve this goal, service providers impose a finegrained …
millisecond-level pricing. To achieve this goal, service providers impose a finegrained …
Ernest: Efficient performance prediction for {Large-Scale} advanced analytics
Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …
Quasar: Resource-efficient and qos-aware cluster management
Cloud computing promises flexibility and high performance for users and high cost-efficiency
for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both …
for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both …
Efficient coflow scheduling with varys
Communication in data-parallel applications often involves a collection of parallel flows.
Traditional techniques to optimize flow-level metrics do not perform well in optimizing such …
Traditional techniques to optimize flow-level metrics do not perform well in optimizing such …
Sparrow: distributed, low latency scheduling
Large-scale data analytics frameworks are shifting towards shorter task durations and larger
degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete …
degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete …