Locality-driven dynamic GPU cache bypassing
This paper presents novel cache optimizations for massively parallel, throughput-oriented
architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing …
architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing …
Zorua: A holistic approach to resource virtualization in GPUs
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …
programmer-specified resource usage of a GPU application from the actual allocation in the …
Warp-level divergence in GPUs: Characterization, impact, and mitigation
High throughput architectures rely on high thread-level parallelism (TLP) to hide execution
latencies. In state-of-art graphics processing units (GPUs), threads are organized in a grid of …
latencies. In state-of-art graphics processing units (GPUs), threads are organized in a grid of …
Pagoda: Fine-grained gpu resource virtualization for narrow tasks
Massively multithreaded GPUs achieve high throughput by running thousands of threads in
parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching …
parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching …
Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit
Modern GPUs require tens of thousands of concurrent threads to fully utilize the massive
amount of processing resources. However, thread concurrency in GPUs can be diminished …
amount of processing resources. However, thread concurrency in GPUs can be diminished …
CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
Parallel programs consist of series of code sections with different thread-level parallelism
(TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU …
(TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU …
NURA: A framework for supporting non-uniform resource accesses in GPUs
Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …
AEML: An acceleration engine for multi-GPU load-balancing in distributed heterogeneous environment
Z Tang, L Du, X Zhang, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
For the rapid growth computation requirements in big data and artificial intelligence area,
CPU-GPU heterogeneous clusters can provide more powerful computing capacity …
CPU-GPU heterogeneous clusters can provide more powerful computing capacity …
Enabling efficient preemption for SIMT architectures with lightweight context switching
Context switching is a key technique enabling preemption and time-multiplexing for CPUs.
However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics …
However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics …
Warp-consolidation: A novel execution model for gpus
With the unprecedented development of compute capability and extension of memory
bandwidth on modern GPUs, parallel communication and synchronization soon becomes a …
bandwidth on modern GPUs, parallel communication and synchronization soon becomes a …