A catalog of stream processing optimizations
Various research communities have independently arrived at stream processing as a
programming model for efficient and parallel computing. These communities include digital …
programming model for efficient and parallel computing. These communities include digital …
Shenango: Achieving high {CPU} efficiency for latency-sensitive datacenter workloads
Datacenter applications demand microsecond-scale tail latencies and high request rates
from operating systems, and most applications handle loads that have high variance over …
from operating systems, and most applications handle loads that have high variance over …
Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices
The microservice architecture is a popular software engineering approach for building
flexible, large-scale online services. Serverless functions, or function as a service (FaaS) …
flexible, large-scale online services. Serverless functions, or function as a service (FaaS) …
Clipper: A {Low-Latency} online prediction serving system
Clipper: A Low-Latency Online Prediction Serving System Page 1 This paper is included in the
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation …
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation …
Caladan: Mitigating interference at microsecond timescales
The conventional wisdom is that CPU resources such as cores, caches, and memory
bandwidth must be partitioned to achieve performance isolation between tasks. Both the …
bandwidth must be partitioned to achieve performance isolation between tasks. Both the …
Tensorflow-serving: Flexible, high-performance ml serving
We describe TensorFlow-Serving, a system to serve machine learning models inside
Google which is also available in the cloud and via open-source. It is extremely flexible in …
Google which is also available in the cloud and via open-source. It is extremely flexible in …
Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization
Traffic optimizations (TO, eg flow scheduling, load balancing) in datacenters are difficult
online decision-making problems. Previously, they are done with heuristics relying on …
online decision-making problems. Previously, they are done with heuristics relying on …
Naiad: a timely dataflow system
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers
the high throughput of batch processors, the low latency of stream processors, and the ability …
the high throughput of batch processors, the low latency of stream processors, and the ability …
InferLine: latency-aware provisioning and scaling for prediction serving pipelines
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a
key challenge in production machine learning. Optimally configuring these pipelines to meet …
key challenge in production machine learning. Optimally configuring these pipelines to meet …
Cassandra: a decentralized structured storage system
A Lakshman, P Malik - ACM SIGOPS operating systems review, 2010 - dl.acm.org
Cassandra is a distributed storage system for managing very large amounts of structured
data spread out across many commodity servers, while providing highly available service …
data spread out across many commodity servers, while providing highly available service …