Fully self-consistent finite-temperature in Gaussian Bloch orbitals for solids

CN Yeh, S Iskakov, D Zgid, E Gull - Physical Review B, 2022 - APS
We present algorithmic and implementation details for the fully self-consistent finite-
temperature GW method in Gaussian Bloch orbitals for solids. Our implementation is based …

Efficient exascale discretizations: High-order finite element methods

T Kolev, P Fischer, M Min, J Dongarra… - … Journal of High …, 2021 - journals.sagepub.com
Efficient exploitation of exascale architectures requires rethinking of the numerical
algorithms used in many large-scale applications. These architectures favor algorithms that …

Solving high-dimensional parabolic PDEs using the tensor train format

L Richter, L Sallandt, N Nüsken - … Conference on Machine …, 2021 - proceedings.mlr.press
High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science
and engineering. However, their numerical treatment poses formidable challenges since …

Performance, design, and autotuning of batched GEMM for GPUs

A Abdelfattah, A Haidar, S Tomov… - … Conference, ISC High …, 2016 - Springer
The general matrix-matrix multiplication (GEMM) is the most important numerical kernel in
dense linear algebra, and is the key component for obtaining high performance in most …

Acceleration of tensor-product operations for high-order finite element methods

K Świrydowicz, N Chalmers… - … Journal of High …, 2019 - journals.sagepub.com
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …

GPU algorithms for efficient exascale discretizations

A Abdelfattah, V Barra, N Beams, R Bleile, J Brown… - Parallel Computing, 2021 - Elsevier
In this paper we describe the research and development activities in the Center for Efficient
Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art …

Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs

A Abdelfattah, S Tomov… - 2019 IEEE international …, 2019 - ieeexplore.ieee.org
Matrix multiplication (GEMM) is the most important operation in dense linear algebra.
Because it is a computebound operation that is rich in data reuse, many applications from …

High-performance matrix-matrix multiplications of very small matrices

I Masliah, A Abdelfattah, A Haidar, S Tomov… - Euro-Par 2016: Parallel …, 2016 - Springer
The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for
obtaining high performance in many scientific computing applications. GEMMs for small …