Grandslam: Guaranteeing slas for jobs in microservices execution frameworks
The microservice architecture has dramatically reduced user effort in adopting and
maintaining servers by providing a catalog of functions as services that can be used as …
maintaining servers by providing a catalog of functions as services that can be used as …
Horus: Interference-aware and prediction-based scheduling in deep learning systems
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped
with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of …
with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of …
Managing GPU concurrency in heterogeneous architectures
Heterogeneous architectures consisting of general-purpose CPUs and throughput-
optimized GPUs are projected to be the dominant computing platforms for many classes of …
optimized GPUs are projected to be the dominant computing platforms for many classes of …
Anatomy of gpu memory system for multi-application execution
As GPUs make headway in the computing landscape spanning mobile platforms,
supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of …
supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of …
Managing DRAM latency divergence in irregular GPGPU applications
Memory controllers in modern GPUs aggressively reorder requests for high bandwidth
usage, often interleaving requests from different warps. This leads to high variance in the …
usage, often interleaving requests from different warps. This leads to high variance in the …
Scheduling page table walks for irregular GPU applications
Recent studies on commercial hardware demonstrated that irregular GPU applications can
bottleneck on virtual-to-physical address translations. In this work, we explore ways to …
bottleneck on virtual-to-physical address translations. In this work, we explore ways to …
Zorua: A holistic approach to resource virtualization in GPUs
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …
programmer-specified resource usage of a GPU application from the actual allocation in the …
Hsm: A hybrid slowdown model for multitasking gpus
Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …
memif Towards Programming Heterogeneous Memory Asynchronously
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …
Efficient and fair multi-programming in GPUs via effective bandwidth management
Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain
degree is known to be effective in improving the overall performance. However, we find that …
degree is known to be effective in improving the overall performance. However, we find that …