A survey of CPU-GPU heterogeneous computing techniques

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer

T Shimokawabe, T Aoki, T Takaki, T Endo… - Proceedings of 2011 …, 2011 - dl.acm.org
The mechanical properties of metal materials largely depend on their intrinsic internal
microstructures. To develop engineering materials with the expected properties, predicting …

Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor

A Heinecke, K Vaidyanathan… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Dense linear algebra has been traditionally used to evaluate the performance and efficiency
of new architectures. This trend has continued for the past half decade with the advent of …

Supporting reuse by delivering task-relevant and personalized information

Y Ye, G Fischer - Proceedings of the 24th international conference on …, 2002 - dl.acm.org
Technical, cognitive, and social factors inhibit the widespread success of systematic
software reuse. Our research is primarily concerned with the cognitive and social challenges …

An RNS Montgomery modular multiplication algorithm

JC Bajard, LS Didier, P Kornerup - IEEE Transactions on …, 1998 - ieeexplore.ieee.org
We present a new RNS modular multiplication for very large operands. The algorithm is
based on Montgomery's method adapted to mixed radix, and is performed using a residue …

An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code

T Shimokawabe, T Aoki, C Muroi… - SC'10: Proceedings …, 2010 - ieeexplore.ieee.org
Regional weather forecasting demands fast simulation over fine-grained grids, resulting in
extremely memory-bottlenecked computation, a difficult problem on conventional …

Highly scalable graph search for the graph500 benchmark

K Ueno, T Suzumura - Proceedings of the 21st international symposium …, 2012 - dl.acm.org
Graph500 is a new benchmark to rank supercomputers with a large-scale graph search
problem. We found that the provided reference implementations are not scalable in a large …

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

KL Spafford, JS Meredith, S Lee, D Li, PC Roth… - Proceedings of the 9th …, 2012 - dl.acm.org
With the rise of general purpose computing on graphics processing units (GPGPU), the
influence from consumer markets can now be seen across the spectrum of computer …

Performance characteristics of Graph500 on large-scale distributed environment

T Suzumura, K Ueno, H Sato… - 2011 IEEE …, 2011 - ieeexplore.ieee.org
Graph500 is a new benchmark for supercomputers based on large-scale graph analysis,
which is becoming an important form of analysis in many real-world applications. Graph …

Optimizing the LINPACK algorithm for large-scale PCIe-based CPU-GPU heterogeneous systems

G Tan, C Shui, Y Wang, X Yu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
There is a widening gap between GPU and other components (CPU, PCIe bus and
communication network) in heterogeneous parallel system. The gap forces us to orchestrate …