{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W **ao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

A survey on deep reinforcement learning for data processing and analytics

Q Cai, C Cui, Y **ong, W Wang, Z **e… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Data processing and analytics are fundamental and pervasive. Algorithms play a vital role in
data processing and analytics where many algorithm designs have incorporated heuristics …

Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs

Q Hu, M Zhang, P Sun, Y Wen, T Zhang - Proceedings of the 28th ACM …, 2023 - dl.acm.org
While recent deep learning workload schedulers exhibit excellent performance, it is arduous
to deploy them in practice due to some substantial defects, including inflexible intrusive …

Owl: Performance-aware scheduling for resource-efficient function-as-a-service cloud

H Tian, S Li, A Wang, W Wang, T Wu… - Proceedings of the 13th …, 2022 - dl.acm.org
This work documents our experience of improving the scheduler in Alibaba Function
Compute, a public FaaS platform. It commences with our observation that memory and CPU …

Golgi: Performance-aware, resource-efficient function scheduling for serverless computing

S Li, W Wang, J Yang, G Chen, D Lu - … of the 2023 ACM Symposium on …, 2023 - dl.acm.org
This paper introduces Golgi, a novel scheduling system designed for serverless functions,
with the goal of minimizing resource provisioning costs while meeting the function latency …

Computing and communication cost-aware service migration enabled by transfer reinforcement learning for dynamic vehicular edge computing networks

Y Peng, X Tang, Y Zhou, J Li, Y Qi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the high mobility of vehicles, service migration is inevitable in vehicular edge
computing (VEC) networks. Frequent service migrations incur prohibitive migration cost …

Graph-reinforcement-learning-based dependency-aware microservice deployment in edge computing

W Lv, P Yang, T Zheng, C Lin, Z Wang… - IEEE Internet of …, 2023 - ieeexplore.ieee.org
Microservice architecture is a design philosophy that achieves decoupling by decomposing
a monolithic application into multiple lightweight microservices. Meanwhile, edge computing …

Workload consolidation in alibaba clusters: the good, the bad, and the ugly

Y Zhang, Y Yu, W Wang, Q Chen, J Wu… - Proceedings of the 13th …, 2022 - dl.acm.org
Web companies typically run latency-critical long-running services and resource-intensive,
throughput-hungry batch jobs in a shared cluster for improved utilization and reduced cost …

Accelerating serverless computing by harvesting idle resources

H Yu, H Wang, J Li, X Yuan, SJ Park - … of the ACM Web Conference 2022, 2022 - dl.acm.org
Serverless computing automates fine-grained resource scaling and simplifies the
development and deployment of online services with stateless functions. However, it is still …

Understanding and optimizing workloads for unified resource management in large cloud platforms

C Lu, H Xu, K Ye, G Xu, L Zhang, G Yang… - Proceedings of the …, 2023 - dl.acm.org
To fully utilize computing resources, cloud providers such as Google and Alibaba choose to
co-locate online services with batch processing applications in their data centers. By …