Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
The analysis of a plane wave pseudopotential density functional theory code on a GPU machine
Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most
widely used material science simulation, and the PWP DFT codes are arguably the most …
widely used material science simulation, and the PWP DFT codes are arguably the most …
Characterizing the influence of system noise on large-scale applications by simulation
This paper presents an in-depth analysis of the impact of system noise on large-scale
parallel application performance in realistic settings. Our analytical model shows that not …
parallel application performance in realistic settings. Our analytical model shows that not …
The landscape of GPGPU performance modeling tools
GPUs are gaining fast adoption as high-performance computing architectures, mainly
because of their impressive peak performance. Yet most applications only achieve small …
because of their impressive peak performance. Yet most applications only achieve small …
An investigation of the performance portability of OpenCL
This paper reports on the development of an MPI/OpenCL implementation of LU, an
application-level benchmark from the NAS Parallel Benchmark Suite. An account of the …
application-level benchmark from the NAS Parallel Benchmark Suite. An account of the …
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
We present the performance analysis of a port of the LU benchmark from the NAS Parallel
Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and …
Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and …
Toward performance models of MPI implementations for understanding application scaling issues
Designing and tuning parallel applications with MPI, particularly at large scale, requires
understanding the performance implications of different choices of algorithms and …
understanding the performance implications of different choices of algorithms and …
WARPP: a toolkit for simulating high-performance parallel scientific codes
There are a number of challenges facing the High Performance Computing (HPC)
community, including increasing levels of concurrency (threads, cores, nodes), deeper and …
community, including increasing levels of concurrency (threads, cores, nodes), deeper and …
MPI+ OpenACC: Accelerating radiation transport mini-application, minisweep, on heterogeneous systems
Architectures are rapidly evolving, and exascale machines are expected to offer billion-way
concurrency. We need to rethink algorithms, languages and programming models among …
concurrency. We need to rethink algorithms, languages and programming models among …
An unstructured CFD mini‐application for the performance prediction of a production CFD code
AMB Owenson, SA Wright, RA Bunt… - Concurrency and …, 2020 - Wiley Online Library
Maintaining the performance of large scientific codes is a difficult task. To aid in this task, a
number of mini‐applications have been developed that are more tractable to analyze than …
number of mini‐applications have been developed that are more tractable to analyze than …
An improved parallelism scheme for deterministic discrete ordinates transport
In this paper we demonstrate techniques for increasing the node-level parallelism of a
deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to …
deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to …