Power-aware Deep Learning Model Serving with {μ-Serve}

H Qiu, W Mao, A Patke, S Cui, S Jha, C Wang… - 2024 USENIX Annual …, 2024 - usenix.org
With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …

Designing cloud servers for lower carbon

J Wang, DS Berger, F Kazhamiaka… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud
computing. We find that cloud compute servers cause the majority of emissions in a general …

Harmonizing efficiency and practicability: optimizing resource utilization in serverless computing with JIAGU

Q Liu, Y Yang, D Du, Y **a, P Zhang, J Feng… - 2024 USENIX Annual …, 2024 - usenix.org
Current serverless platforms struggle to optimize resource utilization due to their dynamic
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …

FLASH: Fast model adaptation in ML-centric cloud platforms

H Qiu, W Mao, A Patke, S Cui, C Wang… - Proceedings of …, 2024 - proceedings.mlsys.org
The emergence of ML in various cloud system management tasks (eg, workload autoscaling
and job scheduling) has become a core driver of ML-centric cloud platforms. However, there …

FLOAT: Federated Learning Optimizations with Automated Tuning

AF Khan, AA Khan, AM Abdelmoniem… - Proceedings of the …, 2024 - dl.acm.org
Federated Learning (FL) has emerged as a powerful approach that enables collaborative
distributed model training without the need for data sharing. However, FL grapples with …

OptScaler: A Collaborative Framework for Robust Autoscaling in the Cloud

D Zou, W Lu, Z Zhu, X Lu, J Zhou, X Wang… - Proceedings of the …, 2024 - dl.acm.org
Autoscaling is a critical mechanism in cloud computing, enabling the autonomous
adjustment of computing resources in response to dynamic workloads. This is particularly …

Multi-agent meta-reinforcement learning: sharper convergence rates with task similarity

W Mao, H Qiu, C Wang, H Franke… - Advances in …, 2024 - proceedings.neurips.cc
Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in
isolation, while in practice the environment is often evolving, leaving many related tasks to …

TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices

J Park, J Park, Y Jung, H Lim, H Yeo… - Proceedings of the ACM …, 2024 - dl.acm.org
Microservice has become a de facto standard for building large-scale cloud applications.
Overload control is essential in preventing microservice failures and maintaining system …

ComboFunc: Joint resource combination and container placement for serverless function scaling with heterogeneous container

Z Wen, Q Chen, Q Deng, Y Niu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Serverless computing provides developers with a maintenance-free approach to resource
usage, but it also transfers resource management responsibility to the cloud platform …

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices

K Hu, L Wen, M Xu, K Ye - arxiv preprint arxiv:2409.14953, 2024 - arxiv.org
Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to
ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies …