- Academic Search

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Speichern Zitieren Zitiert von: 67 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] academia.edu

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers

A Haidar, S Tomov, J Dongarra… - … Conference for High …, 2018 - ieeexplore.ieee.org

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing
applications, especially those in artificial intelligence. Here, we present an investigation …

Speichern Zitieren Zitiert von: 275 Ähnliche Artikel Alle 16 Versionen

[Free GPT-4]

[PDF] deater.net

Measuring energy and power with PAPI

VM Weaver, M Johnson… - 2012 41st …, 2012 - ieeexplore.ieee.org

Energy and power consumption are becoming critical metrics in the design and usage of
high performance systems. We have extended the Performance API (PAPI) analysis library …

Speichern Zitieren Zitiert von: 279 Ähnliche Artikel Alle 12 Versionen

A survey on techniques for cooperative CPU-GPU computing

K Raju, NN Chiplunkar - Sustainable Computing: Informatics and Systems, 2018 - Elsevier

Abstract Graphical Processing Unit provides massive parallelism due to the presence of
hundreds of cores. Usage of GPUs for general purpose computation (GPGPU) has resulted …

Speichern Zitieren Zitiert von: 40 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] psu.edu

Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor

A Heinecke, K Vaidyanathan… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Dense linear algebra has been traditionally used to evaluate the performance and efficiency
of new architectures. This trend has continued for the past half decade with the advent of …

Speichern Zitieren Zitiert von: 221 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] utk.edu

The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale

J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek… - SIAM review, 2018 - SIAM

The computation of the singular value decomposition, or SVD, has a long history with many
improvements over the years, both in its implementations and algorithmically. Here, we …

Speichern Zitieren Zitiert von: 115 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[PDF] siam.org

Simulating low precision floating-point arithmetic

NJ Higham, S Pranesh - SIAM Journal on Scientific Computing, 2019 - SIAM

The half-precision (fp16) floating-point format, defined in the 2008 revision of the IEEE
standard for floating-point arithmetic, and a more recently proposed half-precision format …

Speichern Zitieren Zitiert von: 89 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier

A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …

Speichern Zitieren Zitiert von: 91 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] royalsocietypublishing.org Full View

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

A Haidar, H Bayraktar, S Tomov… - … of the Royal …, 2020 - royalsocietypublishing.org

Double-precision floating-point arithmetic (FP64) has been the de facto standard for
engineering and scientific simulations for several decades. Problem complexity and the …

Speichern Zitieren Zitiert von: 74 Ähnliche Artikel Alle 13 Versionen

[Free GPT-4]

[PDF] hal.science

A hybridization methodology for high-performance linear algebra software for GPUs

E Agullo, C Augonnet, J Dongarra, H Ltaief… - GPU Computing Gems …, 2012 - Elsevier

Publisher Summary This chapter presents a hybridization methodology for the development
of high-performance linear algebra software for graphics processing units (GPUs). The …

Speichern Zitieren Zitiert von: 164 Ähnliche Artikel Alle 6 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Dense linear algebra solvers for multicore with GPU accelerators

Optimization techniques for GPU programming

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers

Measuring energy and power with PAPI

A survey on techniques for cooperative CPU-GPU computing

Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor

The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale

Simulating low precision floating-point arithmetic

[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

A hybridization methodology for high-performance linear algebra software for GPUs