Preliminary performance evaluation of the Fujitsu A64FX using HPC applications

T Odajima, Y Kodama, M Tsuji… - … on cluster computing …, 2020 - ieeexplore.ieee.org
RIKEN Center for Computational Science has been installing the supercomputer Fugaku.
The Fujitsu A64FX, based on the Armv8. 2-A+ SVE architecture, is used in the system. In this …

A performance analysis of the first generation of HPC‐optimized Arm processors

S McIntosh‐Smith, J Price, T Deakin… - Concurrency and …, 2019 - Wiley Online Library
In this paper, we present performance results from Isambard, the first production
supercomputer to be based on Arm CPUs that have been optimized specifically for HPC …

Evaluating the effectiveness of a vector-length-agnostic instruction set

A Poenaru, S McIntosh-Smith - Euro-Par 2020: Parallel Processing: 26th …, 2020 - Springer
In this paper we evaluate the efficacy of the Arm Scalable Vector Extension (SVE) instruction
set for HPC workloads using a set of established mini-apps. Exploiting the vector capabilities …

[PDF][PDF] Comparative benchmarking of the first generation of hpc-optimised arm processors on isambard

S McIntosh-Smith, J Price, T Deakin, A Poenaru - Cray user group, 2018 - uob-hpc.github.io
In this paper we present performance results from Isambard, the first production
supercomputer to be based on Arm CPUs that have been optimised specifically for HPC …

Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic SN Transport Sweeps on Many-Core Architectures

T Deakin, S McIntosh-Smith, J Lovegrove… - … of Computational and …, 2020 - Taylor & Francis
In recent years the computer processors underpinning the large, distributed, workhorse
computers used to solve the Boltzmann transport equation have become ever more parallel …

[PDF][PDF] Modern vector architectures for high-performance computing

A Poenaru - 2022 - research-information.bris.ac.uk
Recent generations of general-purpose central processing units (CPUs) for the high-
performance segment have had to adopt new approaches in order to deliver increasing …

The effects of wide vector operations on processor caches

A Poenaru, S McIntosh-Smith - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
In this paper we investigate the effects of wide vector instructions on modern processor
caches. On the one hand, contemporary processors have large, highly associative caches …

[PDF][PDF] Enabling task parallelism for many-core architectures

PR Atkinson - 2021 - research-information.bris.ac.uk
The requirements placed on computer architectures from modern computational workloads
have driven constant performance improvements. In the last 15 years, the largest source of …

Multi-spectral reuse distance: Divining spatial information from temporal data

AM Cabrera, RD Chamberlain… - 2019 IEEE High …, 2019 - ieeexplore.ieee.org
The problem of efficiently feeding processing elements and finding ways to reduce data
movement is pervasive in computing. Efficient modeling of both temporal and spatial locality …

Hostile Cache Implications for Small, Dense Linear Solves

T Deakin, J Cownie, S McIntosh-Smith… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org
The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of
memory footprint resulting from storing that enormous matrix. An optimisation and work …