Programming languages for data-Intensive HPC applications: A systematic map** study

V Amaral, B Norberto, M Goulão, M Aldinucci… - Parallel Computing, 2020 - Elsevier
A major challenge in modelling and simulation is the need to combine expertise in both
software technologies and a given scientific domain. When High-Performance Computing …

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

S Memeti, L Li, S Pllana, J Kołodziej… - Proceedings of the 2017 …, 2017 - dl.acm.org
Many modern parallel computing systems are heterogeneous at their node level. Such
nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon …

[PDF][PDF] Medical data processing and analysis for remote health and activities monitoring

S Vitabile, M Marks, D Stojanovic, S Pllana… - … and Simulation for …, 2019 - library.oapen.org
Recent developments in sensor technology, wearable computing, Internet of Things (IoT),
and wireless communication have given rise to research in ubiquitous healthcare and …

Load balancing in a changing world: dealing with heterogeneity and performance variability

M Boyer, K Skadron, S Che, N Jayasena - Proceedings of the ACM …, 2013 - dl.acm.org
Fully utilizing the power of modern heterogeneous systems requires judiciously dividing
work across all of the available computational devices. Existing approaches for partitioning …

Autotune: A plugin-driven approach to the automatic tuning of parallel applications

R Miceli, G Civario, A Sikora, E César, M Gerndt… - Applied Parallel and …, 2013 - Springer
Performance analysis and tuning is an important step in programming multicore-and
manycore-based parallel architectures. While there are several tools to help developers …

A review of machine learning and meta-heuristic methods for scheduling parallel computing systems

S Memeti, S Pllana, A Binotto, J Kołodziej… - Proceedings of the …, 2018 - dl.acm.org
Optimized software execution on parallel computing systems demands consideration of
many parameters at run-time. Determining the optimal set of parameters in a given …

Executing an operating system on processors having different instruction set architectures

MR McDonald, EJ Plondke, P Potoplyak… - US Patent …, 2019 - Google Patents
An apparatus includes a first processor having a first instruc tion set and a second processor
having a second instruction set that is different than the first instruction set. The appa ratus …

Algorithmic skeletons and parallel design patterns in mainstream parallel programming

M Danelutto, G Mencagli, M Torquati… - International Journal of …, 2021 - Springer
This paper discusses the impact of structured parallel programming methodologies in state-
of-the-art industrial and research parallel programming frameworks. We first recap the main …

CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi

A Viebke, S Memeti, S Pllana, A Abraham - The Journal of …, 2019 - Springer
Deep learning is an important component of Big Data analytic tools and intelligent
applications, such as self-driving cars, computer vision, speech recognition, or precision …

The READEX formalism for automatic tuning for energy efficiency

J Schuchart, M Gerndt, PG Kjeldsberg, M Lysaght… - Computing, 2017 - Springer
Energy efficiency is an important aspect of future exascale systems, mainly due to rising
energy cost. Although High performance computing (HPC) applications are compute centric …