Google znalac

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org

As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

Spremi Citiraj Spominje se 744 puta Srodni članci Svih 8 inačica Web of Science: 244 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Kernel methods through the roof: handling billions of points efficiently

G Meanti, L Carratino, L Rosasco… - Advances in Neural …, 2020 - proceedings.neurips.cc

Kernel methods provide an elegant and principled approach to nonparametric learning, but
so far could hardly be used in large scale problems, since naïve implementations scale …

Spremi Citiraj Spominje se 136 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] netlib.org

Dense linear algebra solvers for multicore with GPU accelerators

S Tomov, R Nath, H Ltaief… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

Solving dense linear systems of equations is a fundamental problem in scientific computing.
Numerical simulations involving complex systems represented in terms of unknown …

Spremi Citiraj Spominje se 355 puta Srodni članci Svih 26 inačica Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] kaust.edu.sa

PCBDDC: a class of robust dual-primal methods in PETSc

S Zampini - SIAM Journal on Scientific Computing, 2016 - SIAM

A class of preconditioners based on balancing domain decomposition by constraints
methods is introduced in the Portable, Extensible Toolkit for Scientific Computation (PETSc) …

Spremi Citiraj Spominje se 93 puta Srodni članci Svih 7 inačica Web of Science: 54 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] netlib.org

[PDF][PDF] Keeneland: Bringing heterogeneous GPU computing to the computational science community

JS Vetter, R Glassbrook, J Dongarra, K Schwan… - Computing in Science …, 2011 - netlib.org

The Keeneland project—named for a historic thoroughbred horse racing track in Lexington,
Kentucky—is a five-year Track 2D grant awarded by the US National Science Foundation …

Spremi Citiraj Spominje se 156 puta Srodni članci Svih 15 inačica Web of Science: 64 Find this at the Library Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Data-aware task scheduling on multi-accelerator based platforms

C Augonnet, J Clet-Ortega, S Thibault… - 2010 IEEE 16th …, 2010 - ieeexplore.ieee.org

To fully tap into the potential of heterogeneous machines composed of multicore processors
and multiple accelerators, simple offloading approaches in which the main trunk of the …

Spremi Citiraj Spominje se 131 puta Srodni članci Svih 12 inačica Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] warwick.ac.uk

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

SJ Pennycook, SD Hammond, SA Jarvis… - ACM SIGMETRICS …, 2011 - dl.acm.org

We present the performance analysis of a port of the LU benchmark from the NAS Parallel
Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and …

Spremi Citiraj Spominje se 104 puta Srodni članci Svih 14 inačica Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] netlib.org

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

S Tomov, R Nath, J Dongarra - Parallel Computing, 2010 - Elsevier

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous
multicore with GPU accelerators that can exceed 25× the performance of the corresponding …

Spremi Citiraj Spominje se 92 puta Srodni članci Svih 17 inačica Web of Science: 35 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Multifrontal factorization of sparse SPD matrices on GPUs

T George, V Saxena, A Gupta, A Singh… - … Parallel & Distributed …, 2011 - ieeexplore.ieee.org

Solving large sparse linear systems is often the most computationally intensive component
of many scientific computing applications. In the past, sparse multifrontal direct factorization …

Spremi Citiraj Spominje se 60 puta Srodni članci Svih 8 inačica Find this at the Library

Implementing directed acyclic graphs with the heterogeneous system architecture

S Puthoor, AM Aji, S Che, M Daga, W Wu… - Proceedings of the 9th …, 2016 - dl.acm.org

Achieving optimal performance on heterogeneous computing systems requires a
programming model that supports the execution of asynchronous, multi-stream, and out-of …

Spremi Citiraj Spominje se 30 puta Srodni članci Find this at the Library

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

A scalable high performant Cholesky factorization for multicore with GPU accelerators

A survey of CPU-GPU heterogeneous computing techniques

Kernel methods through the roof: handling billions of points efficiently

Dense linear algebra solvers for multicore with GPU accelerators

PCBDDC: a class of robust dual-primal methods in PETSc

[PDF][PDF] Keeneland: Bringing heterogeneous GPU computing to the computational science community

Data-aware task scheduling on multi-accelerator based platforms

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Multifrontal factorization of sparse SPD matrices on GPUs

Implementing directed acyclic graphs with the heterogeneous system architecture