- Academic Search

C Li, SL Song, H Dai, A Sidelnik, SKS Hari… - Proceedings of the 29th …, 2015 - dl.acm.org

This paper presents novel cache optimizations for massively parallel, throughput-oriented
architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing …

Save Cite Cited by 142 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] toronto.edu

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org

This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

Save Cite Cited by 85 Related articles All 27 versions Free GPT-4

[Free GPT-4]

[PDF] ncsu.edu

Warp-level divergence in GPUs: Characterization, impact, and mitigation

P **ang, Y Yang, H Zhou - 2014 IEEE 20th International …, 2014 - ieeexplore.ieee.org

High throughput architectures rely on high thread-level parallelism (TLP) to hide execution
latencies. In state-of-art graphics processing units (GPUs), threads are organized in a grid of …

Save Cite Cited by 96 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] purdue.edu

Pagoda: Fine-grained gpu resource virtualization for narrow tasks

TT Yeh, A Sabne, P Sakdhnagool, R Eigenmann… - ACM SIGPLAN …, 2017 - dl.acm.org

Massively multithreaded GPUs achieve high throughput by running thousands of threads in
parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching …

Save Cite Cited by 62 Related articles All 5 versions Free GPT-4

Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit

MK Yoon, K Kim, S Lee, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org

Modern GPUs require tens of thousands of concurrent threads to fully utilize the massive
amount of processing resources. However, thread concurrency in GPUs can be diminished …

Save Cite Cited by 68 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications

Y Yang, H Zhou - ACM SIGPLAN Notices, 2014 - dl.acm.org

Parallel programs consist of series of code sections with different thread-level parallelism
(TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU …

Save Cite Cited by 91 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] academia.edu

NURA: A framework for supporting non-uniform resource accesses in GPUs

S Darabi, N Mahani, H Baxishi… - Proceedings of the …, 2022 - dl.acm.org

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize
GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …

Save Cite Cited by 15 Related articles All 5 versions Free GPT-4

AEML: An acceleration engine for multi-GPU load-balancing in distributed heterogeneous environment

Z Tang, L Du, X Zhang, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

For the rapid growth computation requirements in big data and artificial intelligence area,
CPU-GPU heterogeneous clusters can provide more powerful computing capacity …

Save Cite Cited by 20 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] ncsu.edu

Enabling efficient preemption for SIMT architectures with lightweight context switching

Z Lin, L Nyland, H Zhou - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org

Context switching is a key technique enabling preemption and time-multiplexing for CPUs.
However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics …

Save Cite Cited by 52 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Warp-consolidation: A novel execution model for gpus

A Li, W Liu, L Wang, K Barker, SL Song - Proceedings of the 2018 …, 2018 - dl.acm.org

With the unprecedented development of compute capability and extension of memory
bandwidth on modern GPUs, parallel communication and synchronization soon becomes a …

Save Cite Cited by 35 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Shared memory multiplexing: A novel way to improve GPGPU throughput

Locality-driven dynamic GPU cache bypassing

Zorua: A holistic approach to resource virtualization in GPUs

Warp-level divergence in GPUs: Characterization, impact, and mitigation

Pagoda: Fine-grained gpu resource virtualization for narrow tasks

Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit

CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications

NURA: A framework for supporting non-uniform resource accesses in GPUs

AEML: An acceleration engine for multi-GPU load-balancing in distributed heterogeneous environment

Enabling efficient preemption for SIMT architectures with lightweight context switching

Warp-consolidation: A novel execution model for gpus