A survey of techniques for managing and leveraging caches in GPUs

S Mittal - Journal of Circuits, Systems, and Computers, 2014 - World Scientific
Initially introduced as special-purpose accelerators for graphics applications, graphics
processing units (GPUs) have now emerged as general purpose computing platforms for a …

Managing DRAM latency divergence in irregular GPGPU applications

N Chatterjee, M O'Connor, GH Loh… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
Memory controllers in modern GPUs aggressively reorder requests for high bandwidth
usage, often interleaving requests from different warps. This leads to high variance in the …

Scheduling page table walks for irregular GPU applications

S Shin, G Cox, M Oskin, GH Loh… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Recent studies on commercial hardware demonstrated that irregular GPU applications can
bottleneck on virtual-to-physical address translations. In this work, we explore ways to …

iGPU: exception support and speculative execution on GPUs

J Menon, M De Kruijf, K Sankaralingam - ACM SIGARCH Computer …, 2012 - dl.acm.org
Since the introduction of fully programmable vertex shader hardware, GPU computing has
made tremendous advances. Exception support and speculative execution are the next …

Microarchitectural performance characterization of irregular GPU kernels

MA O'Neil, M Burtscher - 2014 IEEE International Symposium …, 2014 - ieeexplore.ieee.org
GPUs are increasingly being used to accelerate general-purpose applications, including
applications with data-dependent, irregular memory access patterns and control flow …

Top-down performance profiling on nvidia's gpus

A Saiz, P Prieto, P Abad, JA Gregorio… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
The rise of data-intensive algorithms, such as Machine Learning ones, has meant a strong
diversification of Graphics Processing Units (GPU) in fields with intensive Data-Level …

Architecting the last-level cache for GPUs using STT-RAM technology

MH Samavatian, M Arjomand, R Bashizade… - ACM Transactions on …, 2015 - dl.acm.org
Future GPUs should have larger L2 caches based on the current trends in VLSI technology
and GPU architectures toward increase of processing core count. Larger L2 caches …

Porting CMP benchmarks to GPUs

MD Sinclair, H Duwe, K Sankaralingam - 2011 - minds.wisconsin.edu
GPUs have become increasingly popular in recent years, in large part due to their potential
to offer a large amount of computational power at low prices. They offer massive potential …

GPGPU workload characteristics and performance analysis

S Lal, J Lucas, M Andersch… - 2014 International …, 2014 - ieeexplore.ieee.org
GPUs are much more power-efficient devices compared to CPUs, but due to several
performance bottlenecks, the performance per watt of GPUs is often much lower than what …

Study on dual-channel revenue sharing coordination mechanisms based on the free riding

W Ganfu, AI **ng-zheng… - 2009 6th International …, 2009 - ieeexplore.ieee.org
With the rapid development of e-commerce and the adoption of dual channels, free riding
becomes more prevalent than ever before and often results in channel conflict. In this paper …