Improving I/O performance for exascale applications through online data layout reorganization

L Wan, A Huebl, J Gu, F Poeschel… - … on Parallel and …, 2021 - ieeexplore.ieee.org
The applications being developed within the US Exascale Computing Project (ECP) to run
on imminent Exascale computers will generate scientific results with unprecedented fidelity …

[PDF][PDF] Querying large scientific data sets with adaptable IO system ADIOS

J Gu, S Klasky, N Podhorszki, J Qiang… - … Frontiers: 4th Asian …, 2018 - library.oapen.org
When working with a large dataset, a relatively small fraction of data records are of interest
in each analysis operation. For example, while examining a billion-particle dataset from an …

Design and implementation of the tianhe-2 data storage and management system

YT Lu, P Cheng, ZG Chen - Journal of Computer Science and Technology, 2020 - Springer
With the convergence of high-performance computing (HPC), big data and artificial
intelligence (AI), the HPC community is pushing for “triple use” systems to expedite scientific …

Optimizing the query performance of block index through data analysis and I/O modeling

T Wu, J Chou, S Hao, B Dong, S Klasky… - Proceedings of the …, 2017 - dl.acm.org
Indexing technique has become an efficient tool to enable scientists to directly access the
most relevant data records. But, the time and space requirements of building and storing …

Optimizing data query performance of Bi-cluster for large-scale scientific data in supercomputers

X Liao, Y Shen, S Li, Y Lu, Y Du, Z Chen - The Journal of Supercomputing, 2022 - Springer
Scientific exploration and discovery heavily rely on increasing datasets and strong
supercomputing power. Surging data pose massive data management challenges in …

Dissecting self-describing data formats to enable advanced querying of file metadata

K Duwe, M Kuhn - Proceedings of the 14th ACM International …, 2021 - dl.acm.org
In times of continuously growing data sizes, performing insightful analysis is increasingly
difficult. I/O libraries such as NetCDF and ADIOS2 offer options to manage additional …

UniIndex: An index and query middleware for parallel file systems

P Cheng, Y Wang, Y Lu, Y Du… - … : Practice and Experience, 2020 - Wiley Online Library
As data analysis scenarios keep increasing on high‐performance computing systems, the
ability to select a small fraction of data from a large volume of scientific data sets is vital to …

Bi-cluster: A high-performance data query framework for large-scale scientific data

Y Shen, C Peng, Y Du, Y Lu - 2019 IEEE 21st International …, 2019 - ieeexplore.ieee.org
Emerging scientific computing generates massive amounts of scientific data by relying on
high-performance computer systems, challenging data management and analysis. State-of …

IndexIt: Enhancing data locating services for parallel file systems

P Cheng, Y Wang, Y Lu, Y Du… - 2019 IEEE 21st …, 2019 - ieeexplore.ieee.org
While the ability to access a small fraction of data records from a large volume of scientific
datasets is vital to accelerate scientific discovery, existing parallel file systems face serious …