Adaptive resource efficient microservice deployment in cloud-edge continuum

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots

S Ma, J Pei, W Zhang, G Wang, D Feng, F Yu… - Science Robotics, 2022 - science.org
Recent advances in artificial intelligence have enhanced the abilities of mobile robots in
dealing with complex and dynamic scenarios. However, to enable computationally intensive …

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction

W Cui, H Zhao, Q Chen, N Zheng, J Leng… - Proceedings of the …, 2021 - dl.acm.org
While user-facing services experience diurnal load patterns, co-locating services improve
hardware utilization. Prior work on co-locating services on GPUs run queries sequentially …

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

F Xu, J Xu, J Chen, L Chen, R Shang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
GPUs are essential to accelerating the latency-sensitive deep neural network (DNN)
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …

Inss: An intelligent scheduling orchestrator for multi-gpu inference with spatio-temporal sharing

Z Han, R Zhou, C Xu, Y Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
As the applications of AI proliferate, it is critical to increase the throughput of online DNN
inference services. Multi-process service (MPS) improves the utilization rate of GPU …

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

T Pfandzelter, A Dhakal, E Frachtenberg… - Proceedings of the 24th …, 2023 - dl.acm.org
With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …

A survey of GPU multitasking methods supported by hardware architecture

C Zhao, W Gao, F Nie, H Zhou - IEEE Transactions on Parallel …, 2021 - ieeexplore.ieee.org
The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …

Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration

C Xu, C Li, X Hou, J Mei, J Wang, P Wang… - ACM Transactions on …, 2025 - dl.acm.org
Random walk is a powerful tool for large-scale graph learning, but its high computational
demand presents a challenge. While GPUs can accelerate random walk tasks, current …

Efficient Dynamic Resource Management for Spatial Multitasking GPUs

H Sedighi, D Gehberger… - … on Cloud Computing, 2024 - ieeexplore.ieee.org
The advent of microservice architecture enables complex cloud applications to be realized
via a set of individually isolated components, increasing their flexibility and performance. As …