GPU devices for safety-critical systems: A survey
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …
languages and frameworks can deliver the computing performance required to facilitate the …
The sparse polyhedral framework: Composing compiler-generated inspector-executor code
Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …
simulations, and finite element analysis have performance problems due to their use of …
Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}
Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …
Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …
learning patterns of data and are permeating into different industries and markets. Cloud …
Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing
Studies show that non-graphics programs can be less optimized for the GPU hardware,
leading to significant resource under-utilization. Sharing the GPU among multiple programs …
leading to significant resource under-utilization. Sharing the GPU among multiple programs …
Heimdall: mobile GPU coordination platform for augmented reality applications
We present Heimdall, a mobile GPU coordination platform for emerging Augmented Reality
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …
(AR) applications. Future AR apps impose an explored challenging workload: i) concurrent …
Effisha: A software framework for enabling effficient preemptive scheduling of gpu
Modern GPUs are broadly adopted in many multitasking environments, including data
centers and smartphones. However, the current support for the scheduling of multiple GPU …
centers and smartphones. However, the current support for the scheduling of multiple GPU …
Hardware compute partitioning on NVIDIA GPUs
Embedded and autonomous systems are increasingly integrating AI/ML features, often
enabled by a hardware accelerator such as a GPU. As these workloads become …
enabled by a hardware accelerator such as a GPU. As these workloads become …
Dynamic resource management for efficient utilization of multitasking GPUs
As graphics processing units (GPUs) are broadly adopted, running multiple applications on
a GPU at the same time is beginning to attract wide attention. Recent proposals on …
a GPU at the same time is beginning to attract wide attention. Recent proposals on …
Locality-aware CTA clustering for modern GPUs
Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern
GPUs is often awkward. The locality among global memory requests from different SMs …
GPUs is often awkward. The locality among global memory requests from different SMs …