Pond: Cxl-based memory pooling systems for cloud platforms

H Li, DS Berger, L Hsu, D Ernst, P Zardoshti… - Proceedings of the 28th …, 2023‏ - dl.acm.org
Public cloud providers seek to meet stringent performance requirements and low hardware
cost. A key driver of performance and cost is main memory. Memory pooling promises to …

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W **ao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022‏ - usenix.org
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

Llumnix: Dynamic scheduling for large language model serving

B Sun, Z Huang, H Zhao, W **ao, X Zhang… - … USENIX Symposium on …, 2024‏ - usenix.org
Inference serving for large language models (LLMs) is the key to unleashing their potential
in people's daily lives. However, efficient LLM serving remains challenging today because …

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Y Gan, Y Zhang, D Cheng, A Shetty, P Rathi… - Proceedings of the …, 2019‏ - dl.acm.org
Cloud services have recently started undergoing a major shift from monolithic applications,
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …

Learning scheduling algorithms for data processing clusters

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019‏ - dl.acm.org
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

{FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices

H Qiu, SS Banerjee, S Jha, ZT Kalbarczyk… - 14th USENIX symposium …, 2020‏ - usenix.org
User-facing latency-sensitive web services include numerous distributed,
intercommunicating microservices that promise to simplify software development and …

A closer look at spatiotemporal convolutions for action recognition

D Tran, H Wang, L Torresani, J Ray… - Proceedings of the …, 2018‏ - openaccess.thecvf.com
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and
study their effects on action recognition. Our motivation stems from the observation that 2D …

Cluster resource scheduling in cloud computing: literature review and research challenges

W Khallouli, J Huang - The Journal of supercomputing, 2022‏ - Springer
Scheduling plays a pivotal role in cloud computing systems. Designing an efficient
scheduler is a challenging task. The challenge comes from several aspects, including the …

Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces

J Guo, Z Chang, S Wang, H Ding, Y Feng… - Proceedings of the …, 2019‏ - dl.acm.org
Cloud platform provides great flexibility and cost-efficiency for end-users and cloud
operators. However, low resource utilization in modern datacenters brings huge wastes of …

Sinan: ML-based and QoS-aware resource management for cloud microservices

Y Zhang, W Hua, Z Zhou, GE Suh… - Proceedings of the 26th …, 2021‏ - dl.acm.org
Cloud applications are increasingly shifting from large monolithic services, to large numbers
of loosely-coupled, specialized microservices. Despite their advantages in terms of …