The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Analysis and prioritization the effective factors on increasing farmers resilience under climate change and drought

S Javadinejad, R Dara, F Jafary - Agricultural research, 2021 - Springer
California is severely exposed to drought and damage due to the climate change and
drought belt, which has a major impact on agriculture. So, after the drought crisis, there are …

Correcting soft errors online in fast fourier transform

X Liang, J Chen, D Tao, S Li, P Wu, H Li… - Proceedings of the …, 2017 - dl.acm.org
While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect
soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the …

Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction

T Benacchio, L Bonaventura… - … Journal of High …, 2021 - journals.sagepub.com
Progress in numerical weather and climate prediction accuracy greatly depends on the
growth of the available computing power. As the number of cores in top computing facilities …

Anatomy of high-performance gemm with online fault tolerance on gpus

S Wu, Y Zhai, J Liu, J Huang, Z Jian, B Wong… - Proceedings of the 37th …, 2023 - dl.acm.org
General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as
machine learning and scientific computing since an efficient GEMM implementation is …

Ft-blas: a high performance blas implementation with online fault tolerance

Y Zhai, E Giem, Q Fan, K Zhao, J Liu… - Proceedings of the ACM …, 2021 - dl.acm.org
Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and
machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines …

Letgo: A lightweight continuous framework for hpc applications under failures

B Fang, Q Guan, N Debardeleben… - Proceedings of the 26th …, 2017 - dl.acm.org
Requirements for reliability, low power consumption, and performance place complex and
conflicting demands on the design of high-performance computing (HPC) systems. Fault …

Failure recovery in resilient X10

D Grove, SS Hamouda, B Herta, A Iyengar… - ACM Transactions on …, 2019 - dl.acm.org
Cloud computing has made the resources needed to execute large-scale in-memory
distributed computations widely available. Specialized programming models, eg …

A comparison of application-level fault tolerance schemes for task pools

J Posner, L Reitz, C Fohry - Future Generation Computer Systems, 2020 - Elsevier
Fault tolerance is an important requirement for successful program execution on exascale
systems. The common approach, checkpointing, regularly saves a program's state, such that …

Classification based survey of image registration methods

K Sharma, A Goyal - 2013 Fourth International Conference on …, 2013 - ieeexplore.ieee.org
Image registration technique is useful for variety of applications ranging from surveillance to
image mosaicing where task is to match two or more pictures taken, for example, at different …