A survey of CPU-GPU heterogeneous computing techniques
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …
acknowledged that both of these Processing Units (PUs) have their unique features and …
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
The mechanical properties of metal materials largely depend on their intrinsic internal
microstructures. To develop engineering materials with the expected properties, predicting …
microstructures. To develop engineering materials with the expected properties, predicting …
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor
A Heinecke, K Vaidyanathan… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Dense linear algebra has been traditionally used to evaluate the performance and efficiency
of new architectures. This trend has continued for the past half decade with the advent of …
of new architectures. This trend has continued for the past half decade with the advent of …
Supporting reuse by delivering task-relevant and personalized information
Y Ye, G Fischer - Proceedings of the 24th international conference on …, 2002 - dl.acm.org
Technical, cognitive, and social factors inhibit the widespread success of systematic
software reuse. Our research is primarily concerned with the cognitive and social challenges …
software reuse. Our research is primarily concerned with the cognitive and social challenges …
An RNS Montgomery modular multiplication algorithm
We present a new RNS modular multiplication for very large operands. The algorithm is
based on Montgomery's method adapted to mixed radix, and is performed using a residue …
based on Montgomery's method adapted to mixed radix, and is performed using a residue …
An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code
T Shimokawabe, T Aoki, C Muroi… - SC'10: Proceedings …, 2010 - ieeexplore.ieee.org
Regional weather forecasting demands fast simulation over fine-grained grids, resulting in
extremely memory-bottlenecked computation, a difficult problem on conventional …
extremely memory-bottlenecked computation, a difficult problem on conventional …
Highly scalable graph search for the graph500 benchmark
K Ueno, T Suzumura - Proceedings of the 21st international symposium …, 2012 - dl.acm.org
Graph500 is a new benchmark to rank supercomputers with a large-scale graph search
problem. We found that the provided reference implementations are not scalable in a large …
problem. We found that the provided reference implementations are not scalable in a large …
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
With the rise of general purpose computing on graphics processing units (GPGPU), the
influence from consumer markets can now be seen across the spectrum of computer …
influence from consumer markets can now be seen across the spectrum of computer …
Performance characteristics of Graph500 on large-scale distributed environment
Graph500 is a new benchmark for supercomputers based on large-scale graph analysis,
which is becoming an important form of analysis in many real-world applications. Graph …
which is becoming an important form of analysis in many real-world applications. Graph …
Optimizing the LINPACK algorithm for large-scale PCIe-based CPU-GPU heterogeneous systems
G Tan, C Shui, Y Wang, X Yu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
There is a widening gap between GPU and other components (CPU, PCIe bus and
communication network) in heterogeneous parallel system. The gap forces us to orchestrate …
communication network) in heterogeneous parallel system. The gap forces us to orchestrate …