Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Massively parallel lattice–Boltzmann codes on large GPU clusters

E Calore, A Gabbana, J Kraus, E Pellegrini… - Parallel Computing, 2016 - Elsevier
This paper describes a massively parallel code for a state-of-the art thermal lattice–
Boltzmann method. Our code has been carefully optimized for performance on one GPU and …

A new Neumann boundary condition scheme for the thermal lattice Boltzmann method

IT Martins, VA Matsuda, L Cabezas-Gómez - … Communications in Heat and …, 2024 - Elsevier
In this paper we propose a new scheme for implementing the Neumann boundary condition
(BC) with the thermal Lattice Boltzmann Method (LBM). It consists in transforming the wall …

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy‐aware applications

E Calore, A Gabbana, SF Schifano… - Concurrency and …, 2017 - Wiley Online Library
Energy efficiency is becoming increasingly important for computing systems, in particular for
large scale High Performance Computing (HPC) facilities. In this work, we evaluate, from a …

Characterization of petrophysical properties using pore-network and lattice-Boltzmann modelling: Choice of method and image sub-volume size

N Alyafei, TJ Mckay, TI Solling - Journal of Petroleum Science and …, 2016 - Elsevier
The invention and progression of micro-CT scanning technology has significantly improved
the quality and resolution of tomographic images. It is now possible to fully resolve simpler …

Performance and portability of accelerated lattice Boltzmann applications with OpenACC

E Calore, A Gabbana, J Kraus… - Concurrency and …, 2016 - Wiley Online Library
An increasingly large number of HPC systems rely on heterogeneous architectures
combining traditional multi‐core CPUs with power efficient accelerators. Designing efficient …

Performance and power analysis of hpc workloads on heterogeneous multi-node clusters

F Mantovani, E Calore - Journal of Low Power Electronics and …, 2018 - mdpi.com
Performance analysis tools allow application developers to identify and characterize the
inefficiencies that cause performance degradation in their codes, allowing for application …

Optimization of lattice Boltzmann simulations on heterogeneous computers

E Calore, A Gabbana, SF Schifano… - … Journal of High …, 2019 - journals.sagepub.com
High-performance computing systems are more and more often based on accelerators.
Computing applications targeting those systems often follow a host-driven approach, in …

Early experience on porting and running a Lattice Boltzmann code on the Xeon-Phi co-processor

G Crimi, F Mantovani, M Pivanti, SF Schifano… - Procedia Computer …, 2013 - Elsevier
In this paper we report on our early experience on porting, optimizing and benchmarking a
Lattice Boltzmann (LB) code on the Xeon-Phi co-processor, the first generally available …

Physically based visual simulation of the Lattice Boltzmann method on the GPU: a survey

O Navarro-Hinojosa, S Ruiz-Loza… - The Journal of …, 2018 - Springer
The rapid increase in performance, programmability, and availability of graphics processing
units (GPUs) has made them a compelling platform for computationally demanding tasks in …