Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
A Xu, BT Li - International Journal of Heat and Mass Transfer, 2023 - Elsevier
We assess the performance of the hybrid Open Accelerator (OpenACC) and Message
Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated …
Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated …
Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs
A Xu, BT Li - International Journal of Heat and Mass Transfer, 2024 - Elsevier
Abstract We utilize the Open Accelerator (OpenACC) approach for graphics processing unit
(GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the …
(GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the …
Massively parallel lattice–Boltzmann codes on large GPU clusters
This paper describes a massively parallel code for a state-of-the art thermal lattice–
Boltzmann method. Our code has been carefully optimized for performance on one GPU and …
Boltzmann method. Our code has been carefully optimized for performance on one GPU and …
Beyond moments: relativistic lattice Boltzmann methods for radiative transport in computational astrophysics
We present a new method for the numerical solution of the radiative-transfer equation (RTE)
in multidimensional scenarios commonly encountered in computational astrophysics. The …
in multidimensional scenarios commonly encountered in computational astrophysics. The …
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy‐aware applications
Energy efficiency is becoming increasingly important for computing systems, in particular for
large scale High Performance Computing (HPC) facilities. In this work, we evaluate, from a …
large scale High Performance Computing (HPC) facilities. In this work, we evaluate, from a …
Performance and power analysis of hpc workloads on heterogeneous multi-node clusters
Performance analysis tools allow application developers to identify and characterize the
inefficiencies that cause performance degradation in their codes, allowing for application …
inefficiencies that cause performance degradation in their codes, allowing for application …
Fast kinetic simulator for relativistic matter
Relativistic kinetic theory is ubiquitous to several fields of modern physics, finding
application at large scales in systems in astrophysical contexts, all of the way down to …
application at large scales in systems in astrophysical contexts, all of the way down to …
Optimization of lattice Boltzmann simulations on heterogeneous computers
High-performance computing systems are more and more often based on accelerators.
Computing applications targeting those systems often follow a host-driven approach, in …
Computing applications targeting those systems often follow a host-driven approach, in …
ThunderX2 performance and energy-efficiency for HPC workloads
In the last years, the energy efficiency of HPC systems is increasingly becoming of
paramount importance for environmental, technical, and economical reasons. Several …
paramount importance for environmental, technical, and economical reasons. Several …
Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures
Patient-specific hemodynamic simulations have the potential to greatly improve both the
diagnosis and treatment of a variety of vascular diseases. Portability will enable wider …
diagnosis and treatment of a variety of vascular diseases. Portability will enable wider …