Power-aware Deep Learning Model Serving with {μ-Serve}
With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …
pressing need to reduce the energy consumption of a model-serving cluster while …
Designing cloud servers for lower carbon
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud
computing. We find that cloud compute servers cause the majority of emissions in a general …
computing. We find that cloud compute servers cause the majority of emissions in a general …
Harmonizing efficiency and practicability: optimizing resource utilization in serverless computing with JIAGU
Current serverless platforms struggle to optimize resource utilization due to their dynamic
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …
FLASH: Fast model adaptation in ML-centric cloud platforms
The emergence of ML in various cloud system management tasks (eg, workload autoscaling
and job scheduling) has become a core driver of ML-centric cloud platforms. However, there …
and job scheduling) has become a core driver of ML-centric cloud platforms. However, there …
FLOAT: Federated Learning Optimizations with Automated Tuning
Federated Learning (FL) has emerged as a powerful approach that enables collaborative
distributed model training without the need for data sharing. However, FL grapples with …
distributed model training without the need for data sharing. However, FL grapples with …
OptScaler: A Collaborative Framework for Robust Autoscaling in the Cloud
Autoscaling is a critical mechanism in cloud computing, enabling the autonomous
adjustment of computing resources in response to dynamic workloads. This is particularly …
adjustment of computing resources in response to dynamic workloads. This is particularly …
Multi-agent meta-reinforcement learning: sharper convergence rates with task similarity
Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in
isolation, while in practice the environment is often evolving, leaving many related tasks to …
isolation, while in practice the environment is often evolving, leaving many related tasks to …
TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices
Microservice has become a de facto standard for building large-scale cloud applications.
Overload control is essential in preventing microservice failures and maintaining system …
Overload control is essential in preventing microservice failures and maintaining system …
ComboFunc: Joint resource combination and container placement for serverless function scaling with heterogeneous container
Serverless computing provides developers with a maintenance-free approach to resource
usage, but it also transfers resource management responsibility to the cloud platform …
usage, but it also transfers resource management responsibility to the cloud platform …
MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices
Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to
ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies …
ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies …