Understanding GPU power: A survey of profiling, modeling, and simulation methods

RA Bridges, N Imam, TM Mintz - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Modern graphics processing units (GPUs) have complex architectures that admit
exceptional performance and energy efficiency for high-throughput applications. Although …

Cloud computing landscape and research challenges regarding trust and reputation

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org
Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

Medusa: Simplified graph processing on GPUs

J Zhong, B He - IEEE Transactions on Parallel and Distributed …, 2013 - ieeexplore.ieee.org
Graphs are common data structures for many applications, and efficient graph processing is
a must for application performance. Recently, the graphics processing unit (GPU) has been …

Work-efficient parallel GPU methods for single-source shortest paths

A Davidson, S Baxter, M Garland… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
Finding the shortest paths from a single source to all other vertices is a fundamental method
used in a variety of higher-level graph algorithms. We present three parallel friendly and …

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance

N Ardalani, C Lestourgeon, K Sankaralingam… - Proceedings of the 48th …, 2015 - dl.acm.org
GPUs have become prevalent and more general purpose, but GPU programming remains
challenging and time consuming for the majority of programmers. In addition, it is not always …

Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores

M Ahmad, F Hijaz, Q Shi, O Khan - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Algorithms operating on a graph setting are known to be highly irregular and unstructured.
This leads to workload imbalance and data locality challenge when these algorithms are …

Bit-plane compression: Transforming data for better compression in many-core architectures

J Kim, M Sullivan, E Choukse, M Erez - ACM SIGARCH Computer …, 2016 - dl.acm.org
As key applications become more data-intensive and the computational throughput of
processors increases, the amount of data to be transferred in modern memory subsystems …

Combining dynamic & static scheduling in high-level synthesis

J Cheng, L Josipovic, GA Constantinides… - Proceedings of the …, 2020 - dl.acm.org
A central task in high-level synthesis is scheduling: the allocation of operations to clock
cycles. The classic approach to scheduling is static, in which each operation is mapped to a …

Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systems

V Young, A Jaleel, E Bolotin, E Ebrahimi… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
Historically, improvement in GPU performance has been tightly coupled with transistor
scaling. As Moore's Law slows down, performance of single GPUs may ultimately plateau …

Gnnmark: A benchmark suite to characterize graph neural network training on gpus

T Baruah, K Shivdikar, S Dong, Y Sun… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …