- Academic Search

SJ Leon, Å Björck, W Gander - Numerical Linear Algebra with …, 2013 - Wiley Online Library

SUMMARY In 1907, Erhard Schmidt published a paper in which he introduced an
orthogonalization algorithm that has since become known as the classical Gram‐Schmidt …

Gem Citer Citeret af 200 Relaterede artikler Alle 10 versioner

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Towards dense linear algebra for hybrid GPU accelerated manycore systems

S Tomov, J Dongarra, M Baboulin - Parallel Computing, 2010 - Elsevier

We highlight the trends leading to the increased appeal of using hybrid multicore+ GPU
systems for high performance computing. We present a set of techniques that can be used to …

Gem Citer Citeret af 611 Relaterede artikler Alle 23 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Communication-optimal parallel and sequential QR and LU factorizations

J Demmel, L Grigori, M Hoemmen, J Langou - SIAM Journal on Scientific …, 2012 - SIAM

We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform and just as …

Gem Citer Citeret af 559 Relaterede artikler Alle 29 versioner

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

System identification at the extreme edge for network load reduction in vibration-based monitoring

F Zonzini, V Dertimanis, E Chatzi… - IEEE Internet of Things …, 2022 - ieeexplore.ieee.org

Mechanical complexity, wide dimensions, and big data volume may hamper the
implementation of Internet of Things (IoT)-enabled structural health monitoring (SHM) …

Gem Citer Citeret af 24 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters

Y Liu, N Ding, P Sao, S Williams, XS Li - Proceedings of the International …, 2023 - dl.acm.org

This paper presents a unified communication optimization framework for sparse triangular
solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D …

Gem Citer Citeret af 5 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Optimizing Halley's iteration for computing the matrix polar decomposition

Y Nakatsukasa, Z Bai, F Gygi - SIAM Journal on Matrix Analysis and …, 2010 - SIAM

We introduce a dynamically weighted Halley (DWH) iteration for computing the polar
decomposition of a matrix, and we prove that the new method is globally and asymptotically …

Gem Citer Citeret af 88 Relaterede artikler Alle 16 versioner

[Free GPT-4]
[DeepSeek]

[PDF] u-tokyo.ac.jp

CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system

T Fukaya, Y Nakatsukasa… - 2014 5th workshop …, 2014 - ieeexplore.ieee.org

Designing communication-avoiding algorithms is crucial for high performance computing on
a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm …

Gem Citer Citeret af 56 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product.

H Anzt, S Tomov, JJ Dongarra - SpringSim (HPS), 2015 - researchgate.net

This paper presents a heterogeneous CPU-GPU implementation for a sparse iterative
eigensolver–the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG). For …

Gem Citer Citeret af 52 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Algorithm 980: Sparse QR factorization on the GPU

SN Yeralan, TA Davis, WM Sid-Lakhdar… - ACM Transactions on …, 2017 - dl.acm.org

Sparse matrix factorization involves a mix of regular and irregular computation, which is a
particular challenge when trying to obtain high-performance on the highly parallel general …

Gem Citer Citeret af 49 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] sagepub.com

On the performance and energy efficiency of sparse linear algebra on GPUs

H Anzt, S Tomov, J Dongarra - The International Journal of …, 2017 - journals.sagepub.com

In this paper we unveil some performance and energy efficiency frontiers for sparse
computations on GPU-based supercomputers. We compare the resource efficiency of …

Gem Citer Citeret af 31 Relaterede artikler Alle 7 versioner

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Communication-avoiding parallel and sequential QR factorizations

Gram‐Schmidt orthogonalization: 100 years and more

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Communication-optimal parallel and sequential QR and LU factorizations

System identification at the extreme edge for network load reduction in vibration-based monitoring

Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters

Optimizing Halley's iteration for computing the matrix polar decomposition

CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system

[PDF][PDF] Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product.

Algorithm 980: Sparse QR factorization on the GPU

On the performance and energy efficiency of sparse linear algebra on GPUs