Adaptive resource efficient microservice deployment in cloud-edge continuum
User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …
service is built by connecting multiple microservice stages. Since the entire service is heavy …
Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots
S Ma, J Pei, W Zhang, G Wang, D Feng, F Yu… - Science Robotics, 2022 - science.org
Recent advances in artificial intelligence have enhanced the abilities of mobile robots in
dealing with complex and dynamic scenarios. However, to enable computationally intensive …
dealing with complex and dynamic scenarios. However, to enable computationally intensive …
{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction
While user-facing services experience diurnal load patterns, co-locating services improve
hardware utilization. Prior work on co-locating services on GPUs run queries sequentially …
hardware utilization. Prior work on co-locating services on GPUs run queries sequentially …
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
GPUs are essential to accelerating the latency-sensitive deep neural network (DNN)
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …
Inss: An intelligent scheduling orchestrator for multi-gpu inference with spatio-temporal sharing
As the applications of AI proliferate, it is critical to increase the throughput of online DNN
inference services. Multi-process service (MPS) improves the utilization rate of GPU …
inference services. Multi-process service (MPS) improves the utilization rate of GPU …
Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators
With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …
increasingly rely on specialized hardware accelerators in addition to general-purpose …
A survey of GPU multitasking methods supported by hardware architecture
The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …
graphic processing unit (GPU). GPU multitasking methods are classified into three types …
Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration
Random walk is a powerful tool for large-scale graph learning, but its high computational
demand presents a challenge. While GPUs can accelerate random walk tasks, current …
demand presents a challenge. While GPUs can accelerate random walk tasks, current …
Efficient Dynamic Resource Management for Spatial Multitasking GPUs
H Sedighi, D Gehberger… - … on Cloud Computing, 2024 - ieeexplore.ieee.org
The advent of microservice architecture enables complex cloud applications to be realized
via a set of individually isolated components, increasing their flexibility and performance. As …
via a set of individually isolated components, increasing their flexibility and performance. As …