- Academic Search

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org

Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Uložit Citovat Počet citací tohoto článku: 54 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Argobots: A lightweight low-level threading and tasking framework

S Seo, A Amer, P Balaji, C Bordage… - … on Parallel and …, 2017 - ieeexplore.ieee.org

In the past few decades, a number of user-level threading and tasking models have been
proposed in the literature to address the shortcomings of OS-level threads, primarily with …

Uložit Citovat Počet citací tohoto článku: 158 Související články Všechny verze (počet: 17)

[Free GPT-4]
[DeepSeek]

[PDF] github.io

memif Towards Programming Heterogeneous Memory Asynchronously

FX Lin, X Liu - ACM SIGPLAN Notices, 2016 - dl.acm.org

To harness a heterogeneous memory hierarchy, it is advantageous to integrate application
knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …

Uložit Citovat Počet citací tohoto článku: 78 Související články Všechny verze (počet: 4)

A tool to analyze the performance of multithreaded programs on NUMA architectures

X Liu, J Mellor-Crummey - ACM Sigplan Notices, 2014 - dl.acm.org

Almost all of today's microprocessors contain memory controllers and directly attach to
memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is …

Uložit Citovat Počet citací tohoto článku: 91 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning intermediate representations using graph neural networks for numa and prefetchers optimization

A TehraniJamsaz, M Popov, A Dutta… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

There is a large space of NUMA and hardware prefetcher configurations that can
significantly impact the performance of an application. Previous studies have demonstrated …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 8)

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org

Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

Uložit Citovat Počet citací tohoto článku: 35 Související články Všechny verze (počet: 8)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Modeling and optimizing numa effects and prefetching with machine learning

I Sánchez Barrera, D Black-Schaffer, M Casas… - Proceedings of the 34th …, 2020 - dl.acm.org

Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …

Uložit Citovat Počet citací tohoto článku: 37 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Efficient thread/page/parallelism autotuning for numa systems

M Popov, A Jimborean, D Black-Schaffer - Proceedings of the ACM …, 2019 - dl.acm.org

Current multi-socket systems have complex memory hierarchies with significant Non-
Uniform Memory Access (NUMA) effects: memory performance depends on the location of …

Uložit Citovat Počet citací tohoto článku: 39 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Scalable task parallelism for numa: A uniform abstraction for coordinated scheduling and memory management

A Drebes, A Pop, K Heydemann, A Cohen… - Proceedings of the 2016 …, 2016 - dl.acm.org

Dynamic task-parallel programming models are popular on shared-memory systems,
promising enhanced scalability, load balancing and locality. Yet these promises are …

Uložit Citovat Počet citací tohoto článku: 44 Související články Všechny verze (počet: 9)

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Numamma: Numa memory analyzer

F Trahay, M Selva, L Morel, K Marquet - Proceedings of the 47th …, 2018 - dl.acm.org

Non Uniform Memory Access (NUMA) architectures are nowadays common for running High-
Performance Computing (HPC) applications. In such architectures, several distinct physical …

Uložit Citovat Počet citací tohoto článku: 34 Související články Všechny verze (počet: 6)

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

ForestGOMP: an efficient OpenMP environment for NUMA architectures

Affinity-based thread and data map** in shared memory systems

Argobots: A lightweight low-level threading and tasking framework

memif Towards Programming Heterogeneous Memory Asynchronously

A tool to analyze the performance of multithreaded programs on NUMA architectures

Learning intermediate representations using graph neural networks for numa and prefetchers optimization

Locality-centric data and threadblock management for massive GPUs

Modeling and optimizing numa effects and prefetching with machine learning

Efficient thread/page/parallelism autotuning for numa systems

Scalable task parallelism for numa: A uniform abstraction for coordinated scheduling and memory management

Numamma: Numa memory analyzer