Graviton: Trusted execution environments on {GPUs}

S Volos, K Vaswani, R Bruno - 13th USENIX Symposium on Operating …, 2018 - usenix.org
We propose Graviton, an architecture for supporting trusted execution environments on
GPUs. Graviton enables applications to offload security-and performance-sensitive kernels …

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers

Q Chen, H Yang, M Guo, RS Kannan, J Mars… - Proceedings of the …, 2017 - dl.acm.org
Guaranteeing Quality-of-Service (QoS) of latency-sensitive applications while improving
server utilization through application co-location is important yet challenging in modern …

Heimdall: mobile GPU coordination platform for augmented reality applications

J Yi, Y Lee - Proceedings of the 26th Annual International …, 2020 - dl.acm.org
We present Heimdall, a mobile GPU coordination platform for emerging Augmented Reality
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …

Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming

Q Xu, H Jeon, K Kim, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org
As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of
computing resources to support thread-level parallelism. But even with the best effort …

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

R Ausavarungnirun, V Miller, J Landgraf… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …

Effisha: A software framework for enabling effficient preemptive scheduling of gpu

G Chen, Y Zhao, X Shen, H Zhou - … on Principles and Practice of Parallel …, 2017 - dl.acm.org
Modern GPUs are broadly adopted in many multitasking environments, including data
centers and smartphones. However, the current support for the scheduling of multiple GPU …

Dynamic resource management for efficient utilization of multitasking GPUs

JJK Park, Y Park, S Mahlke - Proceedings of the twenty-second …, 2017 - dl.acm.org
As graphics processing units (GPUs) are broadly adopted, running multiple applications on
a GPU at the same time is beginning to attract wide attention. Recent proposals on …

{G-NET}: Effective {GPU} Sharing in {NFV} Systems

K Zhang, B He, J Hu, Z Wang, B Hua, J Meng… - … USENIX Symposium on …, 2018 - usenix.org
Network Function Virtualization (NFV) virtualizes software network functions to offer flexibility
in their design, management and deployment. Although GPUs have demonstrated their …

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …