Graph processing on GPUs: A survey

X Shi, Z Zheng, Y Zhou, H **, L He, B Liu… - ACM Computing Surveys …, 2018‏ - dl.acm.org
In the big data era, much real-world data can be naturally represented as graphs.
Consequently, many application domains can be modeled as graph processing. Graph …

Cloud computing landscape and research challenges regarding trust and reputation

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010‏ - ieeexplore.ieee.org
Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022‏ - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Cnvlutin: Ineffectual-neuron-free deep neural network computing

J Albericio, P Judd, T Hetherington, T Aamodt… - ACM SIGARCH …, 2016‏ - dl.acm.org
This work observes that a large fraction of the computations performed by Deep Neural
Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of …

Processing data where it makes sense: Enabling in-memory computation

O Mutlu, S Ghose, J Gómez-Luna… - Microprocessors and …, 2019‏ - Elsevier
Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …

GPUWattch: Enabling energy optimizations in GPGPUs

J Leng, T Hetherington, A ElTantawy, S Gilani… - ACM SIGARCH …, 2013‏ - dl.acm.org
General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and
performance per watt has emerged as a more crucial evaluation metric than peak …

Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016‏ - dl.acm.org
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

Cache-conscious wavefront scheduling

TG Rogers, M O'Connor… - 2012 45th Annual IEEE …, 2012‏ - ieeexplore.ieee.org
This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …

OWL: Cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013‏ - dl.acm.org
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013‏ - ieeexplore.ieee.org
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …