FT-CNN: Algorithm-based fault tolerance for convolutional neural networks

K Zhao, S Di, S Li, X Liang, Y Zhai… - … on Parallel and …, 2020 - ieeexplore.ieee.org
Convolutional neural networks (CNNs) are becoming more and more important for solving
challenging and critical problems in many fields. CNN inference applications have been …

Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs

J Kosaian, KV Rashmi - Proceedings of the International Conference for …, 2021 - dl.acm.org
Neural networks (NNs) are increasingly employed in safety-critical domains and in
environments prone to unreliability (eg, soft errors), such as on spacecraft. Therefore, it is …

Anatomy of high-performance gemm with online fault tolerance on gpus

S Wu, Y Zhai, J Liu, J Huang, Z Jian, B Wong… - Proceedings of the 37th …, 2023 - dl.acm.org
General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as
machine learning and scientific computing since an efficient GEMM implementation is …

FT K-Means: A High-Performance K-Means on GPU with Fault Tolerance

S Wu, Y Ding, Y Zhai, J Liu, J Huang… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
K-means is a widely used algorithm in clustering, how-ever, its efficiency is primarily
constrained by the computational cost of distance computing. Existing implementations …

Comparative of advanced sorting algorithms (quick sort, heap sort, merge sort, intro sort, radix sort) based on time and memory usage

M Marcellino, DW Pratama… - 2021 1st International …, 2021 - ieeexplore.ieee.org
Every algorithm has its own best-case as well as its worst-case scenario, so it is difficult to
determine the best sorting algorithm just by its Big-O. Not only that, the amount of memory …

Ft-blas: a high performance blas implementation with online fault tolerance

Y Zhai, E Giem, Q Fan, K Zhao, J Liu… - Proceedings of the ACM …, 2021 - dl.acm.org
Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and
machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines …

Towards end-to-end sdc detection for hpc applications equipped with lossy compression

S Li, S Di, K Zhao, X Liang, Z Chen… - … Conference on Cluster …, 2020 - ieeexplore.ieee.org
Data reduction techniques have been widely demanded and used by large-scale high
performance computing (HPC) applications because of vast volumes of data to be produced …

Improving energy saving of one-sided matrix decompositions on cpu-gpu heterogeneous systems

J Chen, X Liang, K Zhao, HZ Sabzi, L Bhuyan… - Proceedings of the 28th …, 2023 - dl.acm.org
One-sided dense matrix decompositions (eg, Cholesky, LU, and QR) are the key
components in scientific computing in many different fields. Although their design has been …

FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs

Y Zhai, E Giem, K Zhao, J Liu, J Huang… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Basic Linear Algebra Subprograms (BLAS) serve as a foundational library for scientific
computing and machine learning. In this article, we present a new BLAS implementation, FT …

ApproxABFT: Approximate algorithm-based fault tolerance for vision transformers

X Xue, C Liu, H Huang, B Liu, Y Wang, B Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Vision Transformers (ViTs) with outstanding performance becomes a popular backbone of
deep learning models for the main-stream vision tasks including classification, object …