Effective performance portability

SL Harrell, J Kitson, R Bird… - 2018 IEEE/ACM …, 2018 - ieeexplore.ieee.org
Exascale computing brings with it diverse machine architectures and programming
approaches which challenge application developers. Applications need to perform well on a …

Scaling embedded in-situ indexing with deltaFS

Q Zheng, CD Cranor, D Guo, GR Ganger… - … Conference for High …, 2018 - ieeexplore.ieee.org
Analysis of large-scale simulation output is a core element of scientific inquiry, but analysis
queries may experience significant I/O overhead when the data is not structured for efficient …

Tuning parallel data compression and i/o for large-scale earthquake simulation

H Tang, S Byna, NA Petersson… - 2021 ieee international …, 2021 - ieeexplore.ieee.org
Scientific applications, such as those simulating earthquakes, the origins of universe, etc.,
often produce massive amounts of data as high-performance computing (HPC) systems are …

TAPIOCA: An I/O library for optimized topology-aware data aggregation on large-scale supercomputers

F Tessier, V Vishwanath… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Reading and writing data efficiently from storage system is necessary for most scientific
simulations to achieve good performance at scale. Many software solutions have been …

Sctuner: An autotuner addressing dynamic i/o needs on supercomputer i/o subsystems

H Tang, B **e, S Byna, P Carns… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
In high-performance computing (HPC), scientific applications often manage a massive
amount of data using I/O libraries. These libraries provide convenient data model …

Battle of the defaults: Extracting performance characteristics of HDF5 under production load

B **e, H Tang, S Byna, J Hanley… - 2021 IEEE/ACM 21st …, 2021 - ieeexplore.ieee.org
Popular parallel I/O libraries, such as HDF5, provide tuning parameters to obtain superior
performance. However, the selection of effective parameters on production systems is …

Optimizing gpu-enhanced hpc system and cloud procurements for scientific workloads

RT Evans, M Cawood, SL Harrell, L Huang… - … Conference, ISC High …, 2021 - Springer
Modern GPUs are capable of sustaining floating point operation rates and memory
bandwidths that exceed those of most currently available CPUs, making them attractive …

Software-defined storage for fast trajectory queries using a deltafs indexed massive directory

Q Zheng, G Amvrosiadis, S Kadekodi… - Proceedings of the 2nd …, 2017 - dl.acm.org
In this paper we introduce the Indexed Massive Directory, a new technique for indexing data
within DeltaFS. With its design as a scalable, server-less file system for HPC platforms …

Unleashing in-network computing on scientific workloads

D Kim, A Jain, Z Liu, G Amvrosiadis, D Hazen… - arxiv preprint arxiv …, 2020 - arxiv.org
Many recent efforts have shown that in-network computing can benefit various datacenter
applications. In this paper, we explore a relatively less-explored domain which we argue can …

Analysis of Vector Particle-In-Cell (VPIC) memory usage optimizations on cutting-edge computer architectures

N Tan, RF Bird, G Chen, SV Luedtke, BJ Albright… - Journal of …, 2022 - Elsevier
Abstract Vector Particle-In-Cell (VPIC) is one of the fastest plasma simulation codes in the
world, with particle numbers ranging from one trillion on the first petascale system …