Študovňa Google

V Volkov, JW Demmel - SC'08: Proceedings of the 2008 ACM …, 2008 - ieeexplore.ieee.org

We present performance results for dense linear algebra using recent NVIDIA GPUs. Our
matrix-matrix multiply routine (GEMM) runs up to 60% faster than the vendor's …

Uložiť Citovať Citované 1166-krát Súvisiace články Všetky verzie 17

[Free GPT-4]
[DeepSeek]

[PDF] manchester.ac.uk

Towards dense linear algebra for hybrid GPU accelerated manycore systems

S Tomov, J Dongarra, M Baboulin - Parallel Computing, 2010 - Elsevier

We highlight the trends leading to the increased appeal of using hybrid multicore+ GPU
systems for high performance computing. We present a set of techniques that can be used to …

Uložiť Citovať Citované 619-krát Súvisiace články Všetky verzie 23

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures

T Gautier, JVF Lima, N Maillard… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …

Uložiť Citovať Citované 276-krát Súvisiace články Všetky verzie 20

Achieving a single compute device image in OpenCL for multiple GPUs

J Kim, H Kim, JH Lee, J Lee - ACM Sigplan Notices, 2011 - dl.acm.org

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats
them as a single compute device. Providing a single virtual compute device image to the …

Uložiť Citovať Citované 194-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] upc.edu

An extension of the StarSs programming model for platforms with multiple GPUs

E Ayguadé, RM Badia, FD Igual, J Labarta… - Euro-Par 2009 Parallel …, 2009 - Springer

While general-purpose homogeneous multi-core architectures are becoming ubiquitous,
there are clear indications that, for a number of important applications, a better …

Uložiť Citovať Citované 223-krát Súvisiace články Všetky verzie 16

[Free GPT-4]
[DeepSeek]

[PDF] berkeley.edu

[PDF][PDF] LU, QR and Cholesky factorizations using vector capabilities of GPUs

V Volkov, J Demmel - 2008 - eecs.berkeley.edu

We present performance results for dense linear algebra using the 8-series NVIDIA GPUs.
Our matrix-matrix multiply routine (GEMM) runs 60% faster than the vendor implementation …

Uložiť Citovať Citované 240-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Communication-avoiding QR decomposition for GPUs

M Anderson, G Ballard, J Demmel… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org

We describe an implementation of the Communication-Avoiding QR (CAQR) factorization
that runs entirely on a single graphics processor (GPU). We show that the reduction in …

Uložiť Citovať Citované 134-krát Súvisiace články Všetky verzie 15

[Free GPT-4]
[DeepSeek]

[PDF] upc.edu

Overlap** communication and computation by using a hybrid MPI/SMPSs approach

V Marjanović, J Labarta, E Ayguadé… - Proceedings of the 24th …, 2010 - dl.acm.org

Communication overhead is one of the dominant factors affecting performance in high-end
computing systems. To reduce the negative impact of communication, programmers overlap …

Uložiť Citovať Citované 135-krát Súvisiace články Všetky verzie 7

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Hierarchical dag scheduling for hybrid distributed systems

W Wu, A Bouteiller, G Bosilca… - 2015 IEEE …, 2015 - ieeexplore.ieee.org

Accelerator-enhanced computing platforms have drawn a lot of attention due to their
massive peak commutational capacity. Despite significant advances in the programming …

Uložiť Citovať Citované 96-krát Súvisiace články Všetky verzie 18

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

The libflame library for dense matrix computations

FG Van Zee, E Chan, RA Van de Geijn… - … in science & …, 2009 - ieeexplore.ieee.org

Researchers from the Formal Linear Algebra Method Environment (Flame) project have
developed new methodologies for analyzing, designing, and implementing linear algebra …

Uložiť Citovať Citované 136-krát Súvisiace články Všetky verzie 16

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Solving dense linear systems on platforms with multiple hardware accelerators

Benchmarking GPUs to tune dense linear algebra

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures

Achieving a single compute device image in OpenCL for multiple GPUs

An extension of the StarSs programming model for platforms with multiple GPUs

[PDF][PDF] LU, QR and Cholesky factorizations using vector capabilities of GPUs

Communication-avoiding QR decomposition for GPUs

Overlap** communication and computation by using a hybrid MPI/SMPSs approach

Hierarchical dag scheduling for hybrid distributed systems

The libflame library for dense matrix computations