Understanding GPU power: A survey of profiling, modeling, and simulation methods
RA Bridges, N Imam, TM Mintz - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Modern graphics processing units (GPUs) have complex architectures that admit
exceptional performance and energy efficiency for high-throughput applications. Although …
exceptional performance and energy efficiency for high-throughput applications. Although …
Cloud computing landscape and research challenges regarding trust and reputation
Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …
resources (eg, data, calculations, and services) transparently among the users over a …
Medusa: Simplified graph processing on GPUs
Graphs are common data structures for many applications, and efficient graph processing is
a must for application performance. Recently, the graphics processing unit (GPU) has been …
a must for application performance. Recently, the graphics processing unit (GPU) has been …
Work-efficient parallel GPU methods for single-source shortest paths
A Davidson, S Baxter, M Garland… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
Finding the shortest paths from a single source to all other vertices is a fundamental method
used in a variety of higher-level graph algorithms. We present three parallel friendly and …
used in a variety of higher-level graph algorithms. We present three parallel friendly and …
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
GPUs have become prevalent and more general purpose, but GPU programming remains
challenging and time consuming for the majority of programmers. In addition, it is not always …
challenging and time consuming for the majority of programmers. In addition, it is not always …
Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores
Algorithms operating on a graph setting are known to be highly irregular and unstructured.
This leads to workload imbalance and data locality challenge when these algorithms are …
This leads to workload imbalance and data locality challenge when these algorithms are …
Bit-plane compression: Transforming data for better compression in many-core architectures
As key applications become more data-intensive and the computational throughput of
processors increases, the amount of data to be transferred in modern memory subsystems …
processors increases, the amount of data to be transferred in modern memory subsystems …
Combining dynamic & static scheduling in high-level synthesis
A central task in high-level synthesis is scheduling: the allocation of operations to clock
cycles. The classic approach to scheduling is static, in which each operation is mapped to a …
cycles. The classic approach to scheduling is static, in which each operation is mapped to a …
Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systems
Historically, improvement in GPU performance has been tightly coupled with transistor
scaling. As Moore's Law slows down, performance of single GPUs may ultimately plateau …
scaling. As Moore's Law slows down, performance of single GPUs may ultimately plateau …
Gnnmark: A benchmark suite to characterize graph neural network training on gpus
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …