Task-level resilience: checkpointing vs. supervision
J Posner, L Reitz, C Fohry - International Journal of Networking and …, 2022 - jstage.jst.go.jp
With the advent of exascale computing, issues such as application irregularity and
permanent hardware failure are growing in importance. Irregularity is often addressed by …
permanent hardware failure are growing in importance. Irregularity is often addressed by …
Assessing the use cases of persistent memory in high-performance scientific computing
As the High Performance Computing (HPC) world moves towards the Exa-Scale era, huge
amounts of data should be analyzed, manipulated and stored. In the traditional stor …
amounts of data should be analyzed, manipulated and stored. In the traditional stor …
Checkpointing vs. supervision resilience approaches for dynamic independent tasks
J Posner, L Reitz, C Fohry - 2021 IEEE International Parallel …, 2021 - ieeexplore.ieee.org
With the advent of exascale computing, issues such as application irregularity and
permanent hardware failure are growing in importance. Irregularity is often addressed by …
permanent hardware failure are growing in importance. Irregularity is often addressed by …
Application-based fault tolerance for numerical linear algebra at large scale
DA Torres González - European Conference on Parallel Processing, 2021 - Springer
Large scale architectures provide us with high computing power, but as the size of the
systems grows, computation units are more likely to fail. Fault-tolerant mechanisms have …
systems grows, computation units are more likely to fail. Fault-tolerant mechanisms have …
[ALINTI][C] 마이크로 배치 스트리밍 시스템에서 멀티 체크포인팅 기법을 이용한 성능 향상
박규리, 박성용 - 한국정보과학회 학술발표논문집, 2023 - dbpia.co.kr
요 약현재 빅데이터 환경에서 상태 기반 실시간 스트리밍 처리를 위해, LSM-tree 기반의 키-값
저장소가 스트리밍 시스템의 상태 저장소로 도입되었다. 마이크로 배치 스트리밍 시스템에서는 …
저장소가 스트리밍 시스템의 상태 저장소로 도입되었다. 마이크로 배치 스트리밍 시스템에서는 …