Fmi: Fast and cheap message passing for serverless functions

M Copik, R Böhringer, A Calotoiu… - Proceedings of the 37th …, 2023 - dl.acm.org
Serverless functions provide elastic scaling and a fine-grained billing model, making
Function-as-a-Service (FaaS) an attractive programming model. However, for distributed …

Optimizing mpi collectives on shared memory multi-cores

J Peng, J Fang, J Liu, M **e, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org
Message Passing Interface (MPI) programs often experience performance slowdowns due
to collective communication operations, like broadcasting and reductions. As modern CPUs …

Node-aware improvements to allreduce

A Bienz, L Olson, W Gropp - 2019 IEEE/ACM Workshop on …, 2019 - ieeexplore.ieee.org
The MPI_Allreduce collective operation is a core kernel of many parallel codebases,
particularly for reductions over a single value per process. The commonly used allreduce …

CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication

K Ouyang, M Si, A Hori, Z Chen… - … Conference for High …, 2020 - ieeexplore.ieee.org
Load balance is essential for high-performance applications. Unbalanced communication
can cause severe performance degradation, even in computation-balanced BSP …

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

G Katevenis, M Ploumidis, M Marazakis - Proceedings of the 52nd …, 2023 - dl.acm.org
Recent processor advances have made feasible HPC nodes with high core counts, capable
of hosting tens or even, hundreds of processes. Therefore, designing MPI collective …

A framework for hierarchical single-copy MPI collectives on multicore nodes

G Katevenis, M Ploumidis… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Collective operations are widely used by MPI applications to realize their communication
patterns. Their efficiency is crucial for both performance and scalability of parallel …

Encrypted all-reduce on multi-core clusters

M Gavahi, A Naser, C Wu, MS Lahijani… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
We consider the encrypted all-reduce operation on multi-core clusters. We derive
performance bounds for the encrypted all-reduce operation and develop efficient algorithms …

Shared memory based mpi broadcast algorithms for numa systems

M Kurnosov, E Tokmasheva - Russian Supercomputing Days, 2020 - Springer
MPI_Bcast collective communication operation is used by many scientific applications and
tend to limit overall parallel application scalability. This paper investigates the design and …

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

H Ahn, S Kim, Y Park, W Han, S Ahn… - … Conference on Big …, 2024 - ieeexplore.ieee.org
In Artificial Intelligence (AI) and high-performance computing (HPC), growing data and
model sizes require distributed processing across multiple nodes due to single-node …

PiP-MColl: Process-in-Process-based Multi-object MPI Collectives

J Huang, K Ouyang, Y Zhai, J Liu, M Si… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
In the era of exascale computing, the adoption of a large number of CPU cores and nodes
by high-performance computing (HPC) applications has made MPI collective performance …