Fully self-consistent finite-temperature in Gaussian Bloch orbitals for solids
We present algorithmic and implementation details for the fully self-consistent finite-
temperature GW method in Gaussian Bloch orbitals for solids. Our implementation is based …
temperature GW method in Gaussian Bloch orbitals for solids. Our implementation is based …
Efficient exascale discretizations: High-order finite element methods
Efficient exploitation of exascale architectures requires rethinking of the numerical
algorithms used in many large-scale applications. These architectures favor algorithms that …
algorithms used in many large-scale applications. These architectures favor algorithms that …
Solving high-dimensional parabolic PDEs using the tensor train format
High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science
and engineering. However, their numerical treatment poses formidable challenges since …
and engineering. However, their numerical treatment poses formidable challenges since …
Performance, design, and autotuning of batched GEMM for GPUs
A Abdelfattah, A Haidar, S Tomov… - … Conference, ISC High …, 2016 - Springer
The general matrix-matrix multiplication (GEMM) is the most important numerical kernel in
dense linear algebra, and is the key component for obtaining high performance in most …
dense linear algebra, and is the key component for obtaining high performance in most …
Acceleration of tensor-product operations for high-order finite element methods
K Świrydowicz, N Chalmers… - … Journal of High …, 2019 - journals.sagepub.com
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …
performance analysis of three tensor-product operations arising in finite element methods …
GPU algorithms for efficient exascale discretizations
In this paper we describe the research and development activities in the Center for Efficient
Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art …
Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art …
Tensor networks for lattice gauge theories beyond one dimension: a roadmap
Tensor network methods are a class of numerical tools and algorithms to study many-body
quantum systems in and out of equilibrium, based on tailored variational wave functions …
quantum systems in and out of equilibrium, based on tailored variational wave functions …
Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs
A Abdelfattah, S Tomov… - 2019 IEEE international …, 2019 - ieeexplore.ieee.org
Matrix multiplication (GEMM) is the most important operation in dense linear algebra.
Because it is a computebound operation that is rich in data reuse, many applications from …
Because it is a computebound operation that is rich in data reuse, many applications from …
High-performance matrix-matrix multiplications of very small matrices
The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for
obtaining high performance in many scientific computing applications. GEMMs for small …
obtaining high performance in many scientific computing applications. GEMMs for small …
[PDF][PDF] From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs
The numerical approximation of partial differential equations (PDEs) poses formidable
challenges in high dimensions since classical grid-based methods suffer from the so-called …
challenges in high dimensions since classical grid-based methods suffer from the so-called …