Affinity-based thread and data map** in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
Multi-objective co-optimization of FlexRay-based distributed control systems
Recently, research on control and architecture co-design has been drawing increasingly
more attention. This is because these techniques integrate the design of the controllers and …
more attention. This is because these techniques integrate the design of the controllers and …
DeLoc: a locality and memory-congestion-aware task map** method for modern NUMA systems
The map** of tasks to processor cores, called task map**, is crucial to achieving
scalable performance on multicore processors. On modern NUMA (non-uniform memory …
scalable performance on multicore processors. On modern NUMA (non-uniform memory …
Process affinity, metrics and impact on performance: An empirical study
C Bordage, E Jeannot - … Symposium on Cluster, Cloud and Grid …, 2018 - ieeexplore.ieee.org
Process placement, also called topology map**, is a well-known strategy to improve
parallel program execution by reducing the communication cost between processes. It …
parallel program execution by reducing the communication cost between processes. It …
A Low-Level Virtual Machine Just-In-Time Prototype for Running an Energy-Saving Hardware-Aware Map** Algorithm on C/C++ Applications That Use Pthreads
Low-Level Virtual Machine (LLVM) compiler infrastructure is a useful tool for building just-in-
time (JIT) compilers, besides its reliable front end represented by a clang compiler and its …
time (JIT) compilers, besides its reliable front end represented by a clang compiler and its …
Using NAS Parallel Benchmarks to evaluate HPC performance in clouds
Cloud computing is a reality nowadays, however there are few studies trying to understand
what happens in the actual cloud infrastructures for HPC applications. The focus of this study …
what happens in the actual cloud infrastructures for HPC applications. The focus of this study …
Optimizing performance and energy across problem sizes through a search space exploration and machine learning
HPC systems expose configuration options to assist optimization. Configurations such as
parallelism, thread and data map**, or prefetching have been explored but with a limited …
parallelism, thread and data map**, or prefetching have been explored but with a limited …
NUMA-BTDM: A thread map** algorithm for balanced data locality on NUMA systems
I Ştirb - 2016 17th International Conference on Parallel and …, 2016 - ieeexplore.ieee.org
Optimizing for Non-Uniform Memory Access (NUMA) systems could be considered
inappropriate because hardware architecture aware optimizations are not portable. On the …
inappropriate because hardware architecture aware optimizations are not portable. On the …
NUMA-BTLP: A static algorithm for thread classification
I Ştirb - 2018 5th International Conference on Control, Decision …, 2018 - ieeexplore.ieee.org
Despite NUMA aware optimizations are often considered not portable, this paper states that
extending a compiler, supporting compilation of parallel APIs, with NUMA-aware …
extending a compiler, supporting compilation of parallel APIs, with NUMA-aware …
Predicting the soft error vulnerability of parallel applications using machine learning
With the widespread use of the multicore systems having smaller transistor sizes, soft errors
become an important issue for parallel program execution. Fault injection is a prevalent …
become an important issue for parallel program execution. Fault injection is a prevalent …