Affinity-based thread and data map** in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
memif Towards Programming Heterogeneous Memory Asynchronously
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …
Modeling and optimizing numa effects and prefetching with machine learning
Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …
impacts on HPC performance. Optimizing both together leads to a large and complex design …
DR-BW: identifying bandwidth contention in NUMA architectures with supervised learning
Non-Uniform Memory Access (NUMA) architectures are widely used in mainstream multi-
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …
Numamma: Numa memory analyzer
Non Uniform Memory Access (NUMA) architectures are nowadays common for running High-
Performance Computing (HPC) applications. In such architectures, several distinct physical …
Performance Computing (HPC) applications. In such architectures, several distinct physical …
Adapt burstable containers to variable CPU resources
In the age of the cloud-native, container technology, referred as OS-level virtualization, is
increasingly adopted to deploy cloud applications. Compared with virtual machines …
increasingly adopted to deploy cloud applications. Compared with virtual machines …
Data and thread placement in numa architectures: A statistical learning approach
Nowadays, NUMA architectures are common in compute-intensive systems. Achieving high
performance for multi-threaded application requires both a careful placement of threads on …
performance for multi-threaded application requires both a careful placement of threads on …
Reducing data movement on large shared memory systems by exploiting computation dependencies
Shared memory systems are becoming increasingly complex as they typically integrate
several storage devices. That brings different access latencies or bandwidth rates …
several storage devices. That brings different access latencies or bandwidth rates …
Swing to SWT and back: Patterns for API migration by wrap**
TT Bartolomei, K Czarnecki… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
Evolving requirements may necessitate API migration-re-engineering an application to
replace its dependence on one API with the dependence on another API for the same …
replace its dependence on one API with the dependence on another API for the same …
Locality vs. balance: Exploring data map** policies on numa systems
M Diener, EHM Cruz… - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the
map** of memory pages to NUMA nodes influences the performance of parallel …
map** of memory pages to NUMA nodes influences the performance of parallel …