The analysis of a plane wave pseudopotential density functional theory code on a GPU machine

W Jia, Z Cao, L Wang, J Fu, X Chi, W Gao… - Computer Physics …, 2013 - Elsevier
Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most
widely used material science simulation, and the PWP DFT codes are arguably the most …

Characterizing the influence of system noise on large-scale applications by simulation

T Hoefler, T Schneider… - SC'10: Proceedings of the …, 2010 - ieeexplore.ieee.org
This paper presents an in-depth analysis of the impact of system noise on large-scale
parallel application performance in realistic settings. Our analytical model shows that not …

The landscape of GPGPU performance modeling tools

S Madougou, A Varbanescu, C de Laat… - Parallel Computing, 2016 - Elsevier
GPUs are gaining fast adoption as high-performance computing architectures, mainly
because of their impressive peak performance. Yet most applications only achieve small …

An investigation of the performance portability of OpenCL

SJ Pennycook, SD Hammond, SA Wright… - Journal of Parallel and …, 2013 - Elsevier
This paper reports on the development of an MPI/OpenCL implementation of LU, an
application-level benchmark from the NAS Parallel Benchmark Suite. An account of the …

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

SJ Pennycook, SD Hammond, SA Jarvis… - ACM SIGMETRICS …, 2011 - dl.acm.org
We present the performance analysis of a port of the LU benchmark from the NAS Parallel
Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and …

Toward performance models of MPI implementations for understanding application scaling issues

T Hoefler, W Gropp, R Thakur, JL Träff - European MPI Users' Group …, 2010 - Springer
Designing and tuning parallel applications with MPI, particularly at large scale, requires
understanding the performance implications of different choices of algorithms and …

WARPP: a toolkit for simulating high-performance parallel scientific codes

SD Hammond, GR Mudalige, JA Smith… - Proceedings of the 2nd …, 2009 - dl.acm.org
There are a number of challenges facing the High Performance Computing (HPC)
community, including increasing levels of concurrency (threads, cores, nodes), deeper and …

MPI+ OpenACC: Accelerating radiation transport mini-application, minisweep, on heterogeneous systems

R Searles, S Chandrasekaran, W Joubert… - Computer Physics …, 2019 - Elsevier
Architectures are rapidly evolving, and exascale machines are expected to offer billion-way
concurrency. We need to rethink algorithms, languages and programming models among …

An unstructured CFD mini‐application for the performance prediction of a production CFD code

AMB Owenson, SA Wright, RA Bunt… - Concurrency and …, 2020 - Wiley Online Library
Maintaining the performance of large scientific codes is a difficult task. To aid in this task, a
number of mini‐applications have been developed that are more tractable to analyze than …

An improved parallelism scheme for deterministic discrete ordinates transport

T Deakin, S McIntosh-Smith… - … Journal of High …, 2018 - journals.sagepub.com
In this paper we demonstrate techniques for increasing the node-level parallelism of a
deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to …