{ServerlessLLM}:{Low-Latency} serverless inference for large language models

Y Fu, L Xue, Y Huang, AO Brabete, D Ustiugov… - … USENIX Symposium on …, 2024 - usenix.org
This paper presents ServerlessLLM, a distributed system designed to support low-latency
serverless inference for Large Language Models (LLMs). By harnessing the substantial near …

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning

A Qiao, SK Choe, SJ Subramanya… - … on Operating Systems …, 2021 - usenix.org
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-
optimizing inter-dependent factors both at the per-job level and at the cluster-wide level …

Gemini: Fast failure recovery in distributed training with in-memory checkpoints

Z Wang, Z Jia, S Zheng, Z Zhang, X Fu… - Proceedings of the 29th …, 2023 - dl.acm.org
Large deep learning models have recently garnered substantial attention from both
academia and industry. Nonetheless, frequent failures are observed during large model …

Elasticflow: An elastic serverless training platform for distributed deep learning

D Gu, Y Zhao, Y Zhong, Y **ong, Z Han… - Proceedings of the 28th …, 2023 - dl.acm.org
This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep
learning. ElasticFlow provides a serverless interface with two distinct features:(i) users …

Ekko: A {Large-Scale} deep learning recommender system with {Low-Latency} model update

C Sima, Y Fu, MK Sit, L Guo, X Gong, F Lin… - … USENIX Symposium on …, 2022 - usenix.org
Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus
promptly serving new users and content. Existing DLRSs, however, fail to do so. They …

Heet: Accelerating elastic training in heterogeneous deep learning clusters

Z Mo, H Xu, C Xu - Proceedings of the 29th ACM International …, 2024 - dl.acm.org
Modern GPU clusters inherently exhibit heterogeneity, encompassing various aspects such
as computation and communication. This heterogeneity poses a significant challenge for the …

Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

P Zheng, R Pan, T Khan, S Venkataraman… - … USENIX Symposium on …, 2023 - usenix.org
Dynamic adaptation has become an essential technique in accelerating distributed machine
learning (ML) training. Recent studies have shown that dynamically adjusting model …

Distributed analytics for big data: A survey

F Berloco, V Bevilacqua, S Colucci - Neurocomputing, 2024 - Elsevier
In recent years, a constant and fast information growing has characterized digital
applications in the majority of real-life scenarios. Thus, a new information asset, namely Big …

EasyScale: Elastic training with consistent accuracy and improved utilization on GPUs

M Li, W **ao, H Yang, B Sun, H Zhao, S Ren… - Proceedings of the …, 2023 - dl.acm.org
Distributed synchronized GPU training is commonly used for deep learning. The resource
constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long …