A catalog of stream processing optimizations

M Hirzel, R Soulé, S Schneider, B Gedik… - ACM Computing Surveys …, 2014 - dl.acm.org
Various research communities have independently arrived at stream processing as a
programming model for efficient and parallel computing. These communities include digital …

Shenango: Achieving high {CPU} efficiency for latency-sensitive datacenter workloads

A Ousterhout, J Fried, J Behrens, A Belay… - … USENIX Symposium on …, 2019 - usenix.org
Datacenter applications demand microsecond-scale tail latencies and high request rates
from operating systems, and most applications handle loads that have high variance over …

Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices

Z Jia, E Witchel - Proceedings of the 26th ACM international conference …, 2021 - dl.acm.org
The microservice architecture is a popular software engineering approach for building
flexible, large-scale online services. Serverless functions, or function as a service (FaaS) …

Clipper: A {Low-Latency} online prediction serving system

D Crankshaw, X Wang, G Zhou, MJ Franklin… - … USENIX Symposium on …, 2017 - usenix.org
Clipper: A Low-Latency Online Prediction Serving System Page 1 This paper is included in the
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation …

Caladan: Mitigating interference at microsecond timescales

J Fried, Z Ruan, A Ousterhout, A Belay - 14th USENIX Symposium on …, 2020 - usenix.org
The conventional wisdom is that CPU resources such as cores, caches, and memory
bandwidth must be partitioned to achieve performance isolation between tasks. Both the …

Tensorflow-serving: Flexible, high-performance ml serving

C Olston, N Fiedel, K Gorovoy, J Harmsen… - arxiv preprint arxiv …, 2017 - arxiv.org
We describe TensorFlow-Serving, a system to serve machine learning models inside
Google which is also available in the cloud and via open-source. It is extremely flexible in …

Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization

L Chen, J Lingys, K Chen, F Liu - Proceedings of the 2018 conference of …, 2018 - dl.acm.org
Traffic optimizations (TO, eg flow scheduling, load balancing) in datacenters are difficult
online decision-making problems. Previously, they are done with heuristics relying on …

Naiad: a timely dataflow system

DG Murray, F McSherry, R Isaacs, M Isard… - Proceedings of the …, 2013 - dl.acm.org
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers
the high throughput of batch processors, the low latency of stream processors, and the ability …

InferLine: latency-aware provisioning and scaling for prediction serving pipelines

D Crankshaw, GE Sela, X Mo, C Zumar… - Proceedings of the 11th …, 2020 - dl.acm.org
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a
key challenge in production machine learning. Optimally configuring these pipelines to meet …

Cassandra: a decentralized structured storage system

A Lakshman, P Malik - ACM SIGOPS operating systems review, 2010 - dl.acm.org
Cassandra is a distributed storage system for managing very large amounts of structured
data spread out across many commodity servers, while providing highly available service …