A survey on error-bounded lossy compression for scientific datasets

S Di, J Liu, K Zhao, X Liang, R Underwood… - arxiv preprint arxiv …, 2024 - arxiv.org
Error-bounded lossy compression has been effective in significantly reducing the data
storage/transfer burden while preserving the reconstructed data fidelity very well. Many error …

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

Y Huang, S Di, G Li, F Cappello - … : International Conference for …, 2024 - ieeexplore.ieee.org
Existing GPU lossy compressors suffer from expensive data movement overheads,
inefficient memory access patterns, and high synchronization latency, resulting in limited …

Hoszp: An efficient homomorphic error-bounded lossy compressor for scientific data

T Agarwal, S Di, J Huang, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Error-bounded lossy compression has been a critical technique to significantly reduce the
sheer amounts of simulation datasets for high-performance computing (HPC) scientific …

hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression

J Huang, S Di, X Yu, Y Zhai, J Liu, Z Jian… - … Conference for High …, 2024 - ieeexplore.ieee.org
As network bandwidth struggles to keep up with rapidly growing computing capabilities, the
efficiency of collective communication has become a critical challenge for exa-scale …

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

J Jia, C **e, H Lu, D Wang, H Feng, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent years have witnessed a clear trend towards language models with an ever-
increasing number of parameters, as well as the growing training overhead and memory …

PiP-MColl: Process-in-Process-based Multi-object MPI Collectives

J Huang, K Ouyang, Y Zhai, J Liu, M Si… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
In the era of exascale computing, the adoption of a large number of CPU cores and nodes
by high-performance computing (HPC) applications has made MPI collective performance …

Characterization of NCCL and Unified Memory Under Normal and Oversubscribed Memory Conditions

R Strina - 2024 - search.proquest.com
Abstract The NVIDIA Collective Communications Library (NCCL) is a multi-GPU
communication library widely used in applications such as deep learning, molecular …

[PDF][PDF] FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks

E Stein, Q Bramas, F Pisani, T Colombo, C Pelsser - dial.uclouvain.be
Data acquisition (DAQ) networks, widely used in scientific research and industrial
applications, are composed of numerous interconnected servers, exchanging substantial …