CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization
Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …
Resource scheduling techniques in cloud from a view of coordination: a holistic survey
Y Wang, J Yu, Z Yu - Frontiers of Information Technology & Electronic …, 2023 - Springer
Nowadays, the management of resource contention in shared cloud remains a pending
problem. The evolution and deployment of new application paradigms (eg, deep learning …
problem. The evolution and deployment of new application paradigms (eg, deep learning …
CD-MSA: cooperative and deadline-aware scheduling for efficient multi-tenancy on DNN accelerators
With DNN turning into the backbone of AI cloud services and propelling the emergence of
INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the …
INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the …
Global Optimizations & Lightweight Dynamic Logic for Concurrency
Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …
concurrently to improve the device's compute utilization. However, effectively harnessing it …
Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach
A common strategy for improving efficiency in training deep learning entails multiplexing
tasks on a single GPU. To mitigate the interference caused by multiplexing, existing …
tasks on a single GPU. To mitigate the interference caused by multiplexing, existing …
GPUPool: A holistic approach to fine-grained gpu sharing in the cloud
XS Tan, P Golikov, N Vijaykumar… - Proceedings of the …, 2022 - dl.acm.org
As Graphics Processing Units (GPUs) evolved into popular hardware accelerators for many
compute-hungry applications in the cloud, GPU virtualization has become a highly desirable …
compute-hungry applications in the cloud, GPU virtualization has become a highly desirable …
GPU domain specialization via composable on-package architecture
As GPUs scale their low-precision matrix math throughput to boost deep learning (DL)
performance, they upset the balance between math throughput and memory system …
performance, they upset the balance between math throughput and memory system …
RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling
Data movement latency when using on-chip accelerators in emerging heterogeneous
architectures is a serious performance bottleneck. While hardware/software mechanisms …
architectures is a serious performance bottleneck. While hardware/software mechanisms …
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers
NVIDIA's Multi-Instance GPU (MIG) is a feature that enables system designers to reconfigure
one large GPU into multiple smaller GPU slices. This work characterizes this emerging GPU …
one large GPU into multiple smaller GPU slices. This work characterizes this emerging GPU …
Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs
As the popularity and relevance of the Function-as-a-Service (FaaS) model keeps growing,
we believe newer avatars of the service will support computationally intensive SIMT …
we believe newer avatars of the service will support computationally intensive SIMT …