Affinity-based thread and data map** in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient {Far-Memory} Applications

L Chen, S Liu, C Wang, H Ma, Y Qiao, Z Wang… - … USENIX Symposium on …, 2024 - usenix.org
With rapid advances in network hardware, far memory has gained a great deal of traction
due to its ability to break the memory capacity wall. Existing far memory systems fall into one …

memif Towards Programming Heterogeneous Memory Asynchronously

FX Lin, X Liu - ACM SIGPLAN Notices, 2016 - dl.acm.org
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

Modeling and optimizing numa effects and prefetching with machine learning

I Sánchez Barrera, D Black-Schaffer, M Casas… - Proceedings of the 34th …, 2020 - dl.acm.org
Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …

Efficient thread/page/parallelism autotuning for numa systems

M Popov, A Jimborean, D Black-Schaffer - Proceedings of the ACM …, 2019 - dl.acm.org
Current multi-socket systems have complex memory hierarchies with significant Non-
Uniform Memory Access (NUMA) effects: memory performance depends on the location of …

DR-BW: identifying bandwidth contention in NUMA architectures with supervised learning

H Xu, S Wen, A Gimenez, T Gamblin… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Non-Uniform Memory Access (NUMA) architectures are widely used in mainstream multi-
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …

Nuba: Non-uniform bandwidth gpus

X Zhao, M Jahre, Y Tang, G Zhang… - Proceedings of the 28th …, 2023 - dl.acm.org
The parallel execution model of GPUs enables scaling to hundreds of thousands of threads,
which is a key capability that many modern high-performance applications exploit. GPU …

Swing to SWT and back: Patterns for API migration by wrap**

TT Bartolomei, K Czarnecki… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
Evolving requirements may necessitate API migration-re-engineering an application to
replace its dependence on one API with the dependence on another API for the same …

Page migration support for disaggregated non-volatile memories

VR Kommareddy, SD Hammond, C Hughes… - Proceedings of the …, 2019 - dl.acm.org
As demands for memory-intensive applications continue to grow, the memory capacity of
each computing node is expected to grow at a similar pace. In high-performance computing …