CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Y Wang, J Yu, Z Yu - Frontiers of Information Technology & Electronic …, 2023 - Springer
Nowadays, the management of resource contention in shared cloud remains a pending
problem. The evolution and deployment of new application paradigms (eg, deep learning …

CD-MSA: cooperative and deadline-aware scheduling for efficient multi-tenancy on DNN accelerators

C Wang, Y Bai, D Sun - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
With DNN turning into the backbone of AI cloud services and propelling the emergence of
INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the …

Global Optimizations & Lightweight Dynamic Logic for Concurrency

S Pati, S Aga, N Jayasena, MD Sinclair - arxiv preprint arxiv:2409.02227, 2024 - arxiv.org
Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …

Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach

W Chen, Z Mo, H Xu, K Ye, C Xu - Proceedings of the International …, 2023 - dl.acm.org
A common strategy for improving efficiency in training deep learning entails multiplexing
tasks on a single GPU. To mitigate the interference caused by multiplexing, existing …

GPUPool: A holistic approach to fine-grained gpu sharing in the cloud

XS Tan, P Golikov, N Vijaykumar… - Proceedings of the …, 2022 - dl.acm.org
As Graphics Processing Units (GPUs) evolved into popular hardware accelerators for many
compute-hungry applications in the cloud, GPU virtualization has become a highly desirable …

GPU domain specialization via composable on-package architecture

Y Fu, E Bolotin, N Chatterjee, D Nellans… - ACM Transactions on …, 2021 - dl.acm.org
As GPUs scale their low-precision matrix math throughput to boost deep learning (DL)
performance, they upset the balance between math throughput and memory system …

RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling

S Gupta, S Dwarkadas - 2024 IEEE International Symposium …, 2024 - ieeexplore.ieee.org
Data movement latency when using on-chip accelerators in emerging heterogeneous
architectures is a serious performance bottleneck. While hardware/software mechanisms …

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

G Yeo, J Kim, Y Choi, M Rhu - arxiv preprint arxiv:2411.19114, 2024 - arxiv.org
NVIDIA's Multi-Instance GPU (MIG) is a feature that enables system designers to reconfigure
one large GPU into multiple smaller GPU slices. This work characterizes this emerging GPU …

Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs

C Prakash, A Garg, U Bellur, P Kulkarni… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
As the popularity and relevance of the Function-as-a-Service (FaaS) model keeps growing,
we believe newer avatars of the service will support computationally intensive SIMT …