FT-CNN: Algorithm-based fault tolerance for convolutional neural networks
Convolutional neural networks (CNNs) are becoming more and more important for solving
challenging and critical problems in many fields. CNN inference applications have been …
challenging and critical problems in many fields. CNN inference applications have been …
Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs
Neural networks (NNs) are increasingly employed in safety-critical domains and in
environments prone to unreliability (eg, soft errors), such as on spacecraft. Therefore, it is …
environments prone to unreliability (eg, soft errors), such as on spacecraft. Therefore, it is …
Anatomy of high-performance gemm with online fault tolerance on gpus
General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as
machine learning and scientific computing since an efficient GEMM implementation is …
machine learning and scientific computing since an efficient GEMM implementation is …
FT K-Means: A High-Performance K-Means on GPU with Fault Tolerance
K-means is a widely used algorithm in clustering, how-ever, its efficiency is primarily
constrained by the computational cost of distance computing. Existing implementations …
constrained by the computational cost of distance computing. Existing implementations …
Comparative of advanced sorting algorithms (quick sort, heap sort, merge sort, intro sort, radix sort) based on time and memory usage
M Marcellino, DW Pratama… - 2021 1st International …, 2021 - ieeexplore.ieee.org
Every algorithm has its own best-case as well as its worst-case scenario, so it is difficult to
determine the best sorting algorithm just by its Big-O. Not only that, the amount of memory …
determine the best sorting algorithm just by its Big-O. Not only that, the amount of memory …
Ft-blas: a high performance blas implementation with online fault tolerance
Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and
machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines …
machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines …
Towards end-to-end sdc detection for hpc applications equipped with lossy compression
Data reduction techniques have been widely demanded and used by large-scale high
performance computing (HPC) applications because of vast volumes of data to be produced …
performance computing (HPC) applications because of vast volumes of data to be produced …
Improving energy saving of one-sided matrix decompositions on cpu-gpu heterogeneous systems
One-sided dense matrix decompositions (eg, Cholesky, LU, and QR) are the key
components in scientific computing in many different fields. Although their design has been …
components in scientific computing in many different fields. Although their design has been …
FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
Basic Linear Algebra Subprograms (BLAS) serve as a foundational library for scientific
computing and machine learning. In this article, we present a new BLAS implementation, FT …
computing and machine learning. In this article, we present a new BLAS implementation, FT …
ApproxABFT: Approximate algorithm-based fault tolerance for vision transformers
Vision Transformers (ViTs) with outstanding performance becomes a popular backbone of
deep learning models for the main-stream vision tasks including classification, object …
deep learning models for the main-stream vision tasks including classification, object …