Preconditioners for Krylov subspace methods: An overview
When simulating a mechanism from science or engineering, or an industrial process, one is
frequently required to construct a mathematical model, and then resolve this model …
frequently required to construct a mathematical model, and then resolve this model …
Communication lower bounds and optimal algorithms for numerical linear algebra
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …
arithmetic operations it performs. Technological trends have long been reducing the time to …
[BOG][B] Krylov subspace methods: principles and analysis
The mathematical theory of Krylov subspace methods with a focus on solving systems of
linear algebraic equations is given a detailed treatment in this principles-based book …
linear algebraic equations is given a detailed treatment in this principles-based book …
Communication-optimal parallel and sequential QR and LU factorizations
We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform and just as …
(up to polylogarithmic factors) in the amount of communication they perform and just as …
Fast stencil-code computation on a wafer-scale processor
The performance of CPU-based and GPU-based systems is often low for PDE codes, where
large, sparse, and often structured systems of linear equations must be solved. Iterative …
large, sparse, and often structured systems of linear equations must be solved. Iterative …
A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment
Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …
AmgX: A library for GPU accelerated algebraic multigrid and preconditioned iterative methods
The solution of large sparse linear systems arises in many applications, such as
computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are …
computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are …
Hiding global synchronization latency in the preconditioned conjugate gradient algorithm
Scalability of Krylov subspace methods suffers from costly global synchronization steps that
arise in dot-products and norm calculations on parallel machines. In this work, a modified …
arise in dot-products and norm calculations on parallel machines. In this work, a modified …
Dark memory and accelerator-rich system optimization in the dark silicon era
Unlike traditional dark silicon works that attack the computing logic, this article puts a focus
on the memory part, which dissipates most of the energy for memory-bound CPU …
on the memory part, which dissipates most of the energy for memory-bound CPU …
Block Gram-Schmidt algorithms and their stability properties
Abstract Block Gram-Schmidt algorithms serve as essential kernels in many scientific
computing applications, but for many commonly used variants, a rigorous treatment of their …
computing applications, but for many commonly used variants, a rigorous treatment of their …