- Academic Search

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Save Cite Cited by 105 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] science.org

Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots

S Ma, J Pei, W Zhang, G Wang, D Feng, F Yu… - Science Robotics, 2022 - science.org

Recent advances in artificial intelligence have enhanced the abilities of mobile robots in
dealing with complex and dynamic scenarios. However, to enable computationally intensive …

Save Cite Cited by 41 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org

The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Save Cite Cited by 43 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] github.io

Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction

W Cui, H Zhao, Q Chen, N Zheng, J Leng… - Proceedings of the …, 2021 - dl.acm.org

While user-facing services experience diurnal load patterns, co-locating services improve
hardware utilization. Prior work on co-locating services on GPUs run queries sequentially …

Save Cite Cited by 53 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

F Xu, J Xu, J Chen, L Chen, R Shang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN)
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …

Save Cite Cited by 39 Related articles All 5 versions Free GPT-4

Inss: An intelligent scheduling orchestrator for multi-gpu inference with spatio-temporal sharing

Z Han, R Zhou, C Xu, Y Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

As the applications of AI proliferate, it is critical to increase the throughput of online DNN
inference services. Multi-process service (MPS) improves the utilization rate of GPU …

Save Cite Cited by 5 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] frachtenberg.com

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

T Pfandzelter, A Dhakal, E Frachtenberg… - Proceedings of the 24th …, 2023 - dl.acm.org

With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …

Save Cite Cited by 4 Related articles All 7 versions Free GPT-4

A survey of GPU multitasking methods supported by hardware architecture

C Zhao, W Gao, F Nie, H Zhou - IEEE Transactions on Parallel …, 2021 - ieeexplore.ieee.org

The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …

Save Cite Cited by 19 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration

C Xu, C Li, X Hou, J Mei, J Wang, P Wang… - ACM Transactions on …, 2025 - dl.acm.org

Random walk is a powerful tool for large-scale graph learning, but its high computational
demand presents a challenge. While GPUs can accelerate random walk tasks, current …

Save Cite Related articles

Efficient Dynamic Resource Management for Spatial Multitasking GPUs

H Sedighi, D Gehberger… - … on Cloud Computing, 2024 - ieeexplore.ieee.org

The advent of microservice architecture enables complex cloud applications to be realized
via a set of individually isolated components, increasing their flexibility and performance. As …

Create alert

Cite

Advanced search

Saved to My library

Toward qos-awareness and improved utilization of spatial multitasking gpus

Adaptive resource efficient microservice deployment in cloud-edge continuum

Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Inss: An intelligent scheduling orchestrator for multi-gpu inference with spatio-temporal sharing

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

A survey of GPU multitasking methods supported by hardware architecture

Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration

Efficient Dynamic Resource Management for Spatial Multitasking GPUs