Accelerating reduction and scan using tensor core units

A Dakkak, C Li, J **ong, I Gelado, W Hwu - Proceedings of the ACM …, 2019 - dl.acm.org
Driven by deep learning, there has been a surge of specialized processors for matrix
multiplication, referred to as Tensor Core Units (TCUs). These TCUs are capable of …

A scalable, numerically stable, high-performance tridiagonal solver using GPUs

LW Chang, JA Stratton, HS Kim… - SC'12: Proceedings of …, 2012 - ieeexplore.ieee.org
In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver.
The solver is based on the SPIKE algorithm for partitioning a large matrix into small …

Gender identification using frontal facial images

A Jain, J Huang, S Fang - 2005 IEEE international conference …, 2005 - ieeexplore.ieee.org
Computer vision and pattern recognition systems play an important role in our lives by
means of automated face detection, face and gesture recognition, and estimation of gender …

FUNWAVE‐GPU: Multiple‐GPU acceleration of a Boussinesq‐type wave model

Y Yuan, F Shi, JT Kirby, F Yu - Journal of Advances in Modeling …, 2020 - Wiley Online Library
This paper documents development of a multiple‐Graphics Processing Unit (GPU) version
of FUNWAVE‐Total Variation Diminishing (TVD), an open‐source model for solving the fully …

A hybrid parallel solving algorithm on GPU for quasi-tridiagonal system of linear equations

K Li, W Yang, K Li - IEEE Transactions on Parallel and …, 2016 - ieeexplore.ieee.org
There are some quasi-tridiagonal system of linear equations arising from numerical
simulations, and some solving algorithms encounter great challenge on solving quasi …

cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs

P Valero‐Lara, I Martínez‐Pérez… - Concurrency and …, 2018 - Wiley Online Library
The solving of tridiagonal systems is one of the most computationally expensive parts in
many applications, so that multiple studies have explored the use of NVIDIA GPUs to …

A thermal study of a new oil well plugging & abandonment operation

E dos Santos Magalhães, MJS de Lemos - International Journal of Thermal …, 2020 - Elsevier
When the oil and gas extraction in a well is over or ended by wellbore issues, a Plugging
and Abandoning (P&A) operation is required. The usual method is the well cementation. A …

Fast finite difference Poisson solvers on heterogeneous architectures

P Valero-Lara, A Pinelli, M Prieto-Matias - Computer Physics …, 2014 - Elsevier
In this paper we propose and evaluate a set of new strategies for the solution of three
dimensional separable elliptic problems on CPU–GPU platforms. The numerical solution of …

Finpar: A parallel financial benchmark

C Andreetta, V Bégot, J Berthold, M Elsman… - ACM Transactions on …, 2016 - dl.acm.org
Commodity many-core hardware is now mainstream, but parallel programming models are
still lagging behind in efficiently utilizing the application parallelism. There are (at least) two …

NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuthomasbatch

P Valero-Lara, I Martínez-Pérez, R Sirvent… - … Conference on Parallel …, 2017 - Springer
The solving of tridiagonal systems is one of the most computationally expensive parts in
many applications, so that multiple studies have explored the use of NVIDIA GPUs to …