The sol supercomputer at arizona state university

DM Jennewein, J Lee, C Kurtz, W Dizon… - … and Experience in …, 2023 - dl.acm.org
The Sol supercomputer provides ASU researchers access to a state-of-the-art system with
an observed GPU-only HPL speed of 2.272 PetaFLOP/s. This short paper provides a …

Swarm parallelism: Training large models can be surprisingly communication-efficient

M Ryabinin, T Dettmers, M Diskin… - … on Machine Learning, 2023 - proceedings.mlr.press
Many deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC …

Short reasons for long vectors in HPC CPUs: a study based on RISC-V

P Vizcaino, G Ieronymakis, N Dimou… - Proceedings of the SC' …, 2023 - dl.acm.org
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-
Performance Computing (HPC) and mobile technology. Typical commercially-available …

Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

A Radhakrishnan, H Le Berre, B Wilfong… - Computer Physics …, 2024 - Elsevier
Multiphase compressible flows are often characterized by a broad range of space and time
scales, entailing large grids and small time steps. Simulations of these flows on CPU-based …

A case study of porting HPGMG from CUDA to OpenMP target offload

C Daley, H Ahmed, S Williams, N Wright - OpenMP: Portable Multi-Level …, 2020 - Springer
The HPGMG benchmark is a non-trivial Multigrid benchmark used to evaluate system
performance. We ported this benchmark from CUDA to OpenMP target offload and added …

Application experiences on a GPU-accelerated Arm-based HPC testbed

W Elwasif, W Godoy, N Hagerty, JA Harris… - Proceedings of the …, 2023 - dl.acm.org
This paper assesses and reports the experience of ten teams working to port, validate, and
benchmark several High Performance Computing applications on a novel GPU-accelerated …

The specialized high-performance network on anton 3

KS Shim, B Greskamp, B Towles… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Molecular dynamics (MD) simulation, a computationally intensive method that provides
invaluable insights into the behavior of biomolecules, typically requires large-scale …

On the Performance Investigation of a Recursive Fast Optical Switch-Based High Performance Computing Network Architecture

F Yan, X Deng, C Yuan, B Yan… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
We propose a novel high performance computing (HPC) network architecture based on
parallel levels distributed low radix fast optical switches (FOS). We provide a detailed …

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Y Fridman, G Tamir, G Oren - International Conference on High …, 2023 - Springer
Over the last decade, most of the increase in computing power has been gained by
advances in accelerated many-core architectures, mainly in the form of GPGPUs. While …

Exploring fully offloaded gpu stream-aware message passing

N Namashivayam, K Kandalla, JB White III… - arxiv preprint arxiv …, 2023 - arxiv.org
Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-
speed network interconnects. Communication libraries supporting efficient data transfers …