Affinity-based thread and data map** in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Multi-objective co-optimization of FlexRay-based distributed control systems

D Roy, L Zhang, W Chang, D Goswami… - 2016 IEEE Real …, 2016 - ieeexplore.ieee.org
Recently, research on control and architecture co-design has been drawing increasingly
more attention. This is because these techniques integrate the design of the controllers and …

DeLoc: a locality and memory-congestion-aware task map** method for modern NUMA systems

M Agung, MA Amrizal, R Egawa, H Takizawa - IEEE Access, 2020 - ieeexplore.ieee.org
The map** of tasks to processor cores, called task map**, is crucial to achieving
scalable performance on multicore processors. On modern NUMA (non-uniform memory …

Process affinity, metrics and impact on performance: An empirical study

C Bordage, E Jeannot - … Symposium on Cluster, Cloud and Grid …, 2018 - ieeexplore.ieee.org
Process placement, also called topology map**, is a well-known strategy to improve
parallel program execution by reducing the communication cost between processes. It …

A Low-Level Virtual Machine Just-In-Time Prototype for Running an Energy-Saving Hardware-Aware Map** Algorithm on C/C++ Applications That Use Pthreads

I Știrb, GR Gillich - Energies, 2023 - mdpi.com
Low-Level Virtual Machine (LLVM) compiler infrastructure is a useful tool for building just-in-
time (JIT) compilers, besides its reliable front end represented by a clang compiler and its …

Using NAS Parallel Benchmarks to evaluate HPC performance in clouds

TK Okada, A Goldman… - 2016 IEEE 15th …, 2016 - ieeexplore.ieee.org
Cloud computing is a reality nowadays, however there are few studies trying to understand
what happens in the actual cloud infrastructures for HPC applications. The focus of this study …

Optimizing performance and energy across problem sizes through a search space exploration and machine learning

L Scravaglieri, M Popov, LL Pilla, A Guermouche… - Journal of Parallel and …, 2023 - Elsevier
HPC systems expose configuration options to assist optimization. Configurations such as
parallelism, thread and data map**, or prefetching have been explored but with a limited …

NUMA-BTDM: A thread map** algorithm for balanced data locality on NUMA systems

I Ştirb - 2016 17th International Conference on Parallel and …, 2016 - ieeexplore.ieee.org
Optimizing for Non-Uniform Memory Access (NUMA) systems could be considered
inappropriate because hardware architecture aware optimizations are not portable. On the …

NUMA-BTLP: A static algorithm for thread classification

I Ştirb - 2018 5th International Conference on Control, Decision …, 2018 - ieeexplore.ieee.org
Despite NUMA aware optimizations are often considered not portable, this paper states that
extending a compiler, supporting compilation of parallel APIs, with NUMA-aware …

Predicting the soft error vulnerability of parallel applications using machine learning

I Öz, S Arslan - International Journal of Parallel Programming, 2021 - Springer
With the widespread use of the multicore systems having smaller transistor sizes, soft errors
become an important issue for parallel program execution. Fault injection is a prevalent …