Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing
Load balancing is a widely accepted technique for performance optimization of scientific
applications on parallel architectures. Indeed, balanced applications do not waste processor …
applications on parallel architectures. Indeed, balanced applications do not waste processor …
Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2
In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D
compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first …
compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first …
Stencil codes on a vector length agnostic architecture
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD
capabilities, it can provide substantial performance improvements on top of widely used …
capabilities, it can provide substantial performance improvements on top of widely used …
Using Arm's scalable vector extension on stencil codes
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD
capabilities, it can provide substantial performance improvements on top of widely used …
capabilities, it can provide substantial performance improvements on top of widely used …
Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor
The multidimensional positive definite advection transport algorithm (MPDATA) belongs to
the group of nonoscillatory forward‐in‐time algorithms and performs a sequence of stencil …
the group of nonoscillatory forward‐in‐time algorithms and performs a sequence of stencil …
Porting and optimization of solidification application for CPU–MIC hybrid platforms
Modern heterogeneous computing platforms have become powerful HPC solutions, which
could be applied to a wide range of real-life applications. In particular, the hybrid platforms …
could be applied to a wide range of real-life applications. In particular, the hybrid platforms …
Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors
In this work, we take up the challenge of performance portable programming of
heterogeneous stencil computations across a wide range of modern shared-memory …
heterogeneous stencil computations across a wide range of modern shared-memory …
Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations
This paper meets the challenge of harnessing the heterogeneous communication
architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an …
architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an …
Islands-of-cores approach for harnessing SMP/NUMA architectures in heterogeneous stencil computations
SMP/NUMA systems are powerful HPC platforms which could be applied for a wide range of
real-life applications. These systems provide large capacity of shared memory, and allow …
real-life applications. These systems provide large capacity of shared memory, and allow …
[PDF][PDF] Exploring OpenMP Accelerator Model in a real-life scientific application using hybrid CPU-MIC platforms
The main goal of this paper is the suitability assessment of the OpenMP Accelerator Model
(OMPAM) for porting a real-life scientific application to heterogeneous platforms containing a …
(OMPAM) for porting a real-life scientific application to heterogeneous platforms containing a …