Grandslam: Guaranteeing slas for jobs in microservices execution frameworks

RS Kannan, L Subramanian, A Raju, J Ahn… - Proceedings of the …, 2019 - dl.acm.org
The microservice architecture has dramatically reduced user effort in adopting and
maintaining servers by providing a catalog of functions as services that can be used as …

Horus: Interference-aware and prediction-based scheduling in deep learning systems

G Yeung, D Borowiec, R Yang, A Friday… - … on Parallel and …, 2021 - ieeexplore.ieee.org
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped
with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of …

Managing GPU concurrency in heterogeneous architectures

O Kayiran, NC Nachiappan, A Jog… - 2014 47th annual …, 2014 - ieeexplore.ieee.org
Heterogeneous architectures consisting of general-purpose CPUs and throughput-
optimized GPUs are projected to be the dominant computing platforms for many classes of …

Anatomy of gpu memory system for multi-application execution

A Jog, O Kayiran, T Kesten, A Pattnaik… - Proceedings of the …, 2015 - dl.acm.org
As GPUs make headway in the computing landscape spanning mobile platforms,
supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of …

Managing DRAM latency divergence in irregular GPGPU applications

N Chatterjee, M O'Connor, GH Loh… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
Memory controllers in modern GPUs aggressively reorder requests for high bandwidth
usage, often interleaving requests from different warps. This leads to high variance in the …

Scheduling page table walks for irregular GPU applications

S Shin, G Cox, M Oskin, GH Loh… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Recent studies on commercial hardware demonstrated that irregular GPU applications can
bottleneck on virtual-to-physical address translations. In this work, we explore ways to …

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

Hsm: A hybrid slowdown model for multitasking gpus

X Zhao, M Jahre, L Eeckhout - … of the twenty-fifth international conference …, 2020 - dl.acm.org
Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …

memif Towards Programming Heterogeneous Memory Asynchronously

FX Lin, X Liu - ACM SIGPLAN Notices, 2016 - dl.acm.org
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …

Efficient and fair multi-programming in GPUs via effective bandwidth management

H Wang, F Luo, M Ibrahim, O Kayiran… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain
degree is known to be effective in improving the overall performance. However, we find that …