Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture

S Usman, R Mehmood, I Katib, A Albeshri - Electronics, 2022 - mdpi.com
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …

Improving the efficiency of graph algorithm executions on high‐performance computing

MK Moori, HMG de A. Rocha… - Concurrency and …, 2023 - Wiley Online Library
The growing need for extracting information from large graphs has been pushing the
development of parallel graph algorithms. However, the highly irregular structure of the real …

Exposing the locality of heterogeneous memory architectures to HPC applications

B Goglin - Proceedings of the Second International Symposium …, 2016 - dl.acm.org
High-performance computing requires a deep knowledge of the hardware platform to fully
exploit its computing power. The performance of data transfer between cores and memory is …

Smart resource allocation of concurrent execution of parallel applications

VS da Silva, AGD Nogueira, EC de Lima… - Concurrency and …, 2023 - Wiley Online Library
Thread‐level parallelism (TLP) has been widely exploited to optimize computational
resource usage in high‐performance systems. However, as many applications do not scale …

Towards the Structural Modeling of the Topology of next-generation heterogeneous cluster Nodes with hwloc

B Goglin - 2016 - inria.hal.science
Parallel computing platforms are increasingly complex, with multiple cores, shared caches,
and NUMA memory interconnects, as well as asymmetric I/O access. Upcoming …

UPCBLAS: a library for parallel matrix computations in Unified Parallel C

J González‐Domínguez, MJ Martín… - Concurrency and …, 2012 - Wiley Online Library
SUMMARY The popularity of Partitioned Global Address Space (PGAS) languages has
increased during the last years thanks to their high programmability and performance …

On the overhead of topology discovery for locality-aware scheduling in HPC

B Goglin - 2017 25th Euromicro International Conference on …, 2017 - ieeexplore.ieee.org
The increasing complexity of parallel computing platforms requires a deep knowledge of the
hardware and of the application needs. Locality a key criteria for performance optimization. It …

Analyzing the energy efficiency of the memory subsystem in multicore processors

S Catalan, JG Dominguez, R Mayo… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
In this paper we analyze the energy overhead incurred when operating with data stored in
different levels of the memory subsystem (cache levels and DDR chips) of current multicore …

Solving dense linear systems on accelerated multicore architectures

A Rémy - 2015 - theses.hal.science
In this PhD thesis, we study algorithms and implementations to accelerate the solution of
dense linear systems by using hybrid architectures with multicore processors and …

Locality optimization on a NUMA architecture for hybrid LU factorization

A Rémy, M Baboulin, M Sosonkina… - … and Engineering (CSE), 2014 - ebooks.iospress.nl
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense
general linear systems using an LU factorization algorithm. In particular we illustrate how an …