- Academic Search

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org

As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

Speichern Zitieren Zitiert von: 735 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[PDF] neurips.cc

Kernel methods through the roof: handling billions of points efficiently

G Meanti, L Carratino, L Rosasco… - Advances in Neural …, 2020 - proceedings.neurips.cc

Kernel methods provide an elegant and principled approach to nonparametric learning, but
so far could hardly be used in large scale problems, since naïve implementations scale …

Speichern Zitieren Zitiert von: 135 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] netlib.org

Dense linear algebra solvers for multicore with GPU accelerators

S Tomov, R Nath, H Ltaief… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

Solving dense linear systems of equations is a fundamental problem in scientific computing.
Numerical simulations involving complex systems represented in terms of unknown …

Speichern Zitieren Zitiert von: 352 Ähnliche Artikel Alle 25 Versionen

[Free GPT-4]

[PDF] kaust.edu.sa

PCBDDC: a class of robust dual-primal methods in PETSc

S Zampini - SIAM Journal on Scientific Computing, 2016 - SIAM

A class of preconditioners based on balancing domain decomposition by constraints
methods is introduced in the Portable, Extensible Toolkit for Scientific Computation (PETSc) …

Speichern Zitieren Zitiert von: 92 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] netlib.org

[PDF][PDF] Keeneland: Bringing heterogeneous GPU computing to the computational science community

JS Vetter, R Glassbrook, J Dongarra, K Schwan… - Computing in Science …, 2011 - netlib.org

The Keeneland project—named for a historic thoroughbred horse racing track in Lexington,
Kentucky—is a five-year Track 2D grant awarded by the US National Science Foundation …

Speichern Zitieren Zitiert von: 155 Ähnliche Artikel Alle 16 Versionen HTML-Version

[Free GPT-4]

[PDF] hal.science

Data-aware task scheduling on multi-accelerator based platforms

C Augonnet, J Clet-Ortega, S Thibault… - 2010 IEEE 16th …, 2010 - ieeexplore.ieee.org

To fully tap into the potential of heterogeneous machines composed of multicore processors
and multiple accelerators, simple offloading approaches in which the main trunk of the …

Speichern Zitieren Zitiert von: 126 Ähnliche Artikel Alle 13 Versionen

[Free GPT-4]

[PDF] warwick.ac.uk

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

SJ Pennycook, SD Hammond, SA Jarvis… - ACM SIGMETRICS …, 2011 - dl.acm.org

We present the performance analysis of a port of the LU benchmark from the NAS Parallel
Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and …

Speichern Zitieren Zitiert von: 99 Ähnliche Artikel Alle 15 Versionen

[Free GPT-4]

[PDF] netlib.org

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

S Tomov, R Nath, J Dongarra - Parallel Computing, 2010 - Elsevier

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous
multicore with GPU accelerators that can exceed 25× the performance of the corresponding …

Speichern Zitieren Zitiert von: 91 Ähnliche Artikel Alle 17 Versionen

[Free GPT-4]

[PDF] researchgate.net

Multifrontal factorization of sparse SPD matrices on GPUs

T George, V Saxena, A Gupta, A Singh… - … Parallel & Distributed …, 2011 - ieeexplore.ieee.org

Solving large sparse linear systems is often the most computationally intensive component
of many scientific computing applications. In the past, sparse multifrontal direct factorization …

Speichern Zitieren Zitiert von: 60 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]

[PDF] ieee.org

A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations

A Haidar, A Abdelfattah, M Zounon… - … on Parallel and …, 2017 - ieeexplore.ieee.org

We present a high-performance GPU kernel with a substantial speedup over vendor
libraries for very small matrix computations. In addition, we discuss most of the challenges …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 12 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

A scalable high performant Cholesky factorization for multicore with GPU accelerators

A survey of CPU-GPU heterogeneous computing techniques

Kernel methods through the roof: handling billions of points efficiently

Dense linear algebra solvers for multicore with GPU accelerators

PCBDDC: a class of robust dual-primal methods in PETSc

[PDF][PDF] Keeneland: Bringing heterogeneous GPU computing to the computational science community

Data-aware task scheduling on multi-accelerator based platforms

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Multifrontal factorization of sparse SPD matrices on GPUs

A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations