Accelerating collective communication in data parallel training across deep learning frameworks
This work develops new techniques within Horovod, a generic communication library
supporting data parallel training across deep learning frameworks. In particular, we improve …
supporting data parallel training across deep learning frameworks. In particular, we improve …
High-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow
Providing a high-quality performance prediction has the potential to enhance various
aspects of a cluster, such as devising scheduling and provisioning policies, guiding …
aspects of a cluster, such as devising scheduling and provisioning policies, guiding …
Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units used by hundreds to thousands of users …
thousands of CPUs and storage units used by hundreds to thousands of users …
Sctuner: An autotuner addressing dynamic i/o needs on supercomputer i/o subsystems
In high-performance computing (HPC), scientific applications often manage a massive
amount of data using I/O libraries. These libraries provide convenient data model …
amount of data using I/O libraries. These libraries provide convenient data model …
AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis
Manually diagnosing the I/O performance bottleneck for a single application (hereinafter
referred to as the" job level'') is a tedious and error-prone procedure requiring domain …
referred to as the" job level'') is a tedious and error-prone procedure requiring domain …
Battle of the defaults: Extracting performance characteristics of HDF5 under production load
Popular parallel I/O libraries, such as HDF5, provide tuning parameters to obtain superior
performance. However, the selection of effective parameters on production systems is …
performance. However, the selection of effective parameters on production systems is …
I/O-signature-based feature analysis and classification of high-performance computing applications
The demand for high-performance computing (HPC) resources in computing fields such as
machine learning has increased significantly in recent years. Computing power has been …
machine learning has increased significantly in recent years. Computing power has been …
I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O
I/O bandwidth is a critical resource in an HPC cluster. As with all shared resources, its
availability is impacted significantly by the users and the applications they execute. Without …
availability is impacted significantly by the users and the applications they execute. Without …
Report for the ASCR Workshop on the Management and Storage of Scientific Data
The purpose of this workshop is to identify priority research directions in the area of data
management for high-performance and scientific computing above and beyond HPC's …
management for high-performance and scientific computing above and beyond HPC's …
[PDF][PDF] Design and implementation of I/O performance prediction scheme on HPC systems
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units used by hundreds to thousands of users …
thousands of CPUs and storage units used by hundreds to thousands of users …