Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Y Gan, Y Zhang, D Cheng, A Shetty, P Rathi… - Proceedings of the …, 2019 - dl.acm.org
Cloud services have recently started undergoing a major shift from monolithic applications,
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …

Sinan: ML-based and QoS-aware resource management for cloud microservices

Y Zhang, W Hua, Z Zhou, GE Suh… - Proceedings of the 26th …, 2021 - dl.acm.org
Cloud applications are increasingly shifting from large monolithic services, to large numbers
of loosely-coupled, specialized microservices. Despite their advantages in terms of …

Adaptive resource efficient microservice deployment in cloud-edge continuum

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Kraken: Adaptive container provisioning for deploying dynamic dags in serverless platforms

VM Bhasi, JR Gunasekaran, P Thinakaran… - Proceedings of the …, 2021 - dl.acm.org
The growing popularity of microservices has led to the proliferation of online cloud service-
based applications, which are typically modelled as Directed Acyclic Graphs (DAGs) …

Grandslam: Guaranteeing slas for jobs in microservices execution frameworks

RS Kannan, L Subramanian, A Raju, J Ahn… - Proceedings of the …, 2019 - dl.acm.org
The microservice architecture has dramatically reduced user effort in adopting and
maintaining servers by providing a catalog of functions as services that can be used as …

Fifer: Tackling resource underutilization in the serverless era

JR Gunasekaran, P Thinakaran… - Proceedings of the 21st …, 2020 - dl.acm.org
Datacenters are witnessing a rapid surge in the adoption of serverless functions for
microservices-based applications. A vast majority of these microservices typically span less …

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q **, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

Cypress: Input size-sensitive container provisioning and request scheduling for serverless platforms

VM Bhasi, JR Gunasekaran, A Sharma… - Proceedings of the 13th …, 2022 - dl.acm.org
The growing popularity of the serverless platform has seen an increase in the number and
variety of applications (apps) being deployed on it. The majority of these apps process user …

Proscale: Proactive autoscaling for microservice with time-varying workload at the edge

K Cheng, S Zhang, C Tu, X Shi, Z Yin… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Deploying microservice instances on the edge device close to end users can provide on-site
processing thus reducing request response time. Each microservice has multiple instances …