Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

MIMD programs execution support on SIMD machines: a holistic survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit

F Petrovič, D Střelák, J Hozzová, J Ol'ha… - Future Generation …, 2020 - Elsevier
In recent years, the heterogeneity of both commodity and supercomputers hardware has
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Y Shih, G Wright, J Andén, J Blaschke… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Nonuniform fast Fourier transforms dominate the computational cost in many applications
including image reconstruction and signal processing. We thus present a general-purpose …

Advances in xmipp for cryo–electron microscopy: From xmipp to scipion

D Strelak, A Jiménez-Moreno, JL Vilas… - Molecules, 2021 - mdpi.com
Xmipp is an open-source software package consisting of multiple programs for processing
data originating from electron microscopy and electron tomography, designed and managed …

A survey of performance tuning techniques and tools for parallel applications

D Mustafa - IEEE Access, 2022 - ieeexplore.ieee.org
Automatic parallelization of sequential programs combined with auto-tuning is an alternative
to manual parallelization. With wider research directions and the increased number of …

Using hardware performance counters to speed up autotuning convergence on GPUs

J Filipovič, J Hozzová, A Nezarat, J Ol'ha… - Journal of Parallel and …, 2022 - Elsevier
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing
tasks on a variety of hardware. However, due to the diversity of GPU architectures and …

Estimating resource budgets to ensure autotuning efficiency

J Olha, J Hozzová, M Antol, J Filipovič - Parallel Computing, 2025 - Elsevier
Many state-of-the-art HPC applications rely on autotuning to maintain peak performance.
Autotuning allows a program to be re-optimized for new hardware, settings, or input–even …

Umpalumpa: a framework for efficient execution of complex image processing workloads on heterogeneous nodes

D Střelák, D Myška, F Petrovič, J Polák, J Ol'ha… - Computing, 2023 - Springer
Modern computers are typically heterogeneous devices—besides the standard central
processing unit (CPU), they commonly include an accelerator such as a graphics processing …

Leveraging the Hardware Resources to Accelerate cryo-EM Reconstruction of RELION on the New Sunway Supercomputer

J Xu, J Fu, L Gan, Y Chen, Z Sun, Z Huang… - ACM Transactions on …, 2024 - dl.acm.org
The fast development of biomolecular structure determination has enabled the fine-grained
study of objects in the micro-world, such as proteins and RNAs. The world is benefited …