Canary: fault-tolerant faas for stateful time-sensitive applications

M Arif, K Assogba, MM Rafique - … : International Conference for …, 2022 - ieeexplore.ieee.org
Function-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful
applications have been migrated to FaaS platforms due to their ease of deployment …

Evaluating the potential of disaggregated memory systems for HPC applications

N Ding, P Maris, HA Nam, T Groves… - Concurrency and …, 2024 - Wiley Online Library
Disaggregated memory is a promising approach that addresses the limitations of traditional
memory architectures by enabling memory to be decoupled from compute nodes and …

DDStore: Distributed data store for scalable training of graph neural networks on large atomistic modeling datasets

JY Choi, M Lupo Pasini, P Zhang, K Mehta… - Proceedings of the SC' …, 2023 - dl.acm.org
Graph neural networks (GNNs) are a class of Deep Learning models used in designing
atomistic materials for effective screening of large chemical spaces. To ensure robust …

Methodology for Evaluating the Potential of Disaggregated Memory Systems

N Ding, S Williams, HA Nam, T Groves… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Tightly-coupled HPC systems have rigid memory allocation and can result in expensive
memory resource underutilization. As novel memory and network technologies mature …

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming

S Shen, L Huang, M Chrapek… - … Conference for High …, 2024 - ieeexplore.ieee.org
The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC
clusters has unintentionally aggravated network latency, adversely affecting the …

[HTML][HTML] Application of differential privacy to sensor data in water quality monitoring task

A Arzovs, S Parshutin, V Urbanovics, J Rubulis… - Ecological …, 2025 - Elsevier
Although differential privacy (DP) is used to obfuscate local information and avoid data
leakage, very little research exists on the neural network model performance with applied …

A Workflow Roofline Model for End-to-End Workflow Performance Analysis

N Ding, B Austin, Y Liu, N Mehta… - … Conference for High …, 2024 - ieeexplore.ieee.org
As next-generation experimental and observational instruments for scientific research are
being deployed with higher resolutions and faster data capture rates, the fundamental …

Accelerating I/O performance of ZFS-based Lustre file system in HPC environment

J Bang, C Kim, EK Byun, H Sung, J Lee… - The Journal of …, 2023 - Springer
To meet increasing data access performance demands of applications run on high-
performance computing (HPC) systems, an efficient design of HPC storage file system is …

Collective Communication Performance Evaluation for Distributed Deep Learning Training

S Lee, J Lee - Applied Sciences, 2024 - mdpi.com
In distributed deep learning, the improper use of the collective communication library can
lead to a decline in deep learning performance due to increased communication time …

Preprocessing pipeline optimization for scientific deep learning workloads

KZ Ibrahim, L Oliker - 2022 IEEE International Parallel and …, 2022 - ieeexplore.ieee.org
Newly developed machine learning technology is promising to profoundly impact high-
performance computing, with the potential to significantly accelerate scientific discoveries …