Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …
societies. High-performance computing (HPC) provides the necessary computational power …
Improving the efficiency of graph algorithm executions on high‐performance computing
MK Moori, HMG de A. Rocha… - Concurrency and …, 2023 - Wiley Online Library
The growing need for extracting information from large graphs has been pushing the
development of parallel graph algorithms. However, the highly irregular structure of the real …
development of parallel graph algorithms. However, the highly irregular structure of the real …
Exposing the locality of heterogeneous memory architectures to HPC applications
B Goglin - Proceedings of the Second International Symposium …, 2016 - dl.acm.org
High-performance computing requires a deep knowledge of the hardware platform to fully
exploit its computing power. The performance of data transfer between cores and memory is …
exploit its computing power. The performance of data transfer between cores and memory is …
Smart resource allocation of concurrent execution of parallel applications
VS da Silva, AGD Nogueira, EC de Lima… - Concurrency and …, 2023 - Wiley Online Library
Thread‐level parallelism (TLP) has been widely exploited to optimize computational
resource usage in high‐performance systems. However, as many applications do not scale …
resource usage in high‐performance systems. However, as many applications do not scale …
Towards the Structural Modeling of the Topology of next-generation heterogeneous cluster Nodes with hwloc
B Goglin - 2016 - inria.hal.science
Parallel computing platforms are increasingly complex, with multiple cores, shared caches,
and NUMA memory interconnects, as well as asymmetric I/O access. Upcoming …
and NUMA memory interconnects, as well as asymmetric I/O access. Upcoming …
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
SUMMARY The popularity of Partitioned Global Address Space (PGAS) languages has
increased during the last years thanks to their high programmability and performance …
increased during the last years thanks to their high programmability and performance …
On the overhead of topology discovery for locality-aware scheduling in HPC
B Goglin - 2017 25th Euromicro International Conference on …, 2017 - ieeexplore.ieee.org
The increasing complexity of parallel computing platforms requires a deep knowledge of the
hardware and of the application needs. Locality a key criteria for performance optimization. It …
hardware and of the application needs. Locality a key criteria for performance optimization. It …
Analyzing the energy efficiency of the memory subsystem in multicore processors
In this paper we analyze the energy overhead incurred when operating with data stored in
different levels of the memory subsystem (cache levels and DDR chips) of current multicore …
different levels of the memory subsystem (cache levels and DDR chips) of current multicore …
Solving dense linear systems on accelerated multicore architectures
A Rémy - 2015 - theses.hal.science
In this PhD thesis, we study algorithms and implementations to accelerate the solution of
dense linear systems by using hybrid architectures with multicore processors and …
dense linear systems by using hybrid architectures with multicore processors and …
Locality optimization on a NUMA architecture for hybrid LU factorization
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense
general linear systems using an LU factorization algorithm. In particular we illustrate how an …
general linear systems using an LU factorization algorithm. In particular we illustrate how an …