Improving I/O performance for exascale applications through online data layout reorganization
The applications being developed within the US Exascale Computing Project (ECP) to run
on imminent Exascale computers will generate scientific results with unprecedented fidelity …
on imminent Exascale computers will generate scientific results with unprecedented fidelity …
[PDF][PDF] Querying large scientific data sets with adaptable IO system ADIOS
J Gu, S Klasky, N Podhorszki, J Qiang… - … Frontiers: 4th Asian …, 2018 - library.oapen.org
When working with a large dataset, a relatively small fraction of data records are of interest
in each analysis operation. For example, while examining a billion-particle dataset from an …
in each analysis operation. For example, while examining a billion-particle dataset from an …
Design and implementation of the tianhe-2 data storage and management system
YT Lu, P Cheng, ZG Chen - Journal of Computer Science and Technology, 2020 - Springer
With the convergence of high-performance computing (HPC), big data and artificial
intelligence (AI), the HPC community is pushing for “triple use” systems to expedite scientific …
intelligence (AI), the HPC community is pushing for “triple use” systems to expedite scientific …
Optimizing the query performance of block index through data analysis and I/O modeling
Indexing technique has become an efficient tool to enable scientists to directly access the
most relevant data records. But, the time and space requirements of building and storing …
most relevant data records. But, the time and space requirements of building and storing …
Optimizing data query performance of Bi-cluster for large-scale scientific data in supercomputers
Scientific exploration and discovery heavily rely on increasing datasets and strong
supercomputing power. Surging data pose massive data management challenges in …
supercomputing power. Surging data pose massive data management challenges in …
Dissecting self-describing data formats to enable advanced querying of file metadata
In times of continuously growing data sizes, performing insightful analysis is increasingly
difficult. I/O libraries such as NetCDF and ADIOS2 offer options to manage additional …
difficult. I/O libraries such as NetCDF and ADIOS2 offer options to manage additional …
UniIndex: An index and query middleware for parallel file systems
P Cheng, Y Wang, Y Lu, Y Du… - … : Practice and Experience, 2020 - Wiley Online Library
As data analysis scenarios keep increasing on high‐performance computing systems, the
ability to select a small fraction of data from a large volume of scientific data sets is vital to …
ability to select a small fraction of data from a large volume of scientific data sets is vital to …
Bi-cluster: A high-performance data query framework for large-scale scientific data
Emerging scientific computing generates massive amounts of scientific data by relying on
high-performance computer systems, challenging data management and analysis. State-of …
high-performance computer systems, challenging data management and analysis. State-of …
IndexIt: Enhancing data locating services for parallel file systems
P Cheng, Y Wang, Y Lu, Y Du… - 2019 IEEE 21st …, 2019 - ieeexplore.ieee.org
While the ability to access a small fraction of data records from a large volume of scientific
datasets is vital to accelerate scientific discovery, existing parallel file systems face serious …
datasets is vital to accelerate scientific discovery, existing parallel file systems face serious …