Google 학술 검색

M Copik, R Böhringer, A Calotoiu… - Proceedings of the 37th …, 2023 - dl.acm.org

Serverless functions provide elastic scaling and a fine-grained billing model, making
Function-as-a-Service (FaaS) an attractive programming model. However, for distributed …

저장 인용 22회 인용 관련 학술자료 전체 35개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] whiterose.ac.uk

Optimizing mpi collectives on shared memory multi-cores

J Peng, J Fang, J Liu, M **e, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org

Message Passing Interface (MPI) programs often experience performance slowdowns due
to collective communication operations, like broadcasting and reductions. As modern CPUs …

저장 인용 5회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Node-aware improvements to allreduce

A Bienz, L Olson, W Gropp - 2019 IEEE/ACM Workshop on …, 2019 - ieeexplore.ieee.org

The MPI_Allreduce collective operation is a core kernel of many parallel codebases,
particularly for reductions over a single value per process. The commonly used allreduce …

저장 인용 19회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] github.io

CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication

K Ouyang, M Si, A Hori, Z Chen… - … Conference for High …, 2020 - ieeexplore.ieee.org

Load balance is essential for high-performance applications. Unbalanced communication
can cause severe performance degradation, even in computation-balanced BSP …

저장 인용 8회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

G Katevenis, M Ploumidis, M Marazakis - Proceedings of the 52nd …, 2023 - dl.acm.org

Recent processor advances have made feasible HPC nodes with high core counts, capable
of hosting tens or even, hundreds of processes. Therefore, designing MPI collective …

저장 인용 1회 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] deep-projects.eu

A framework for hierarchical single-copy MPI collectives on multicore nodes

G Katevenis, M Ploumidis… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Collective operations are widely used by MPI applications to realize their communication
patterns. Their efficiency is crucial for both performance and scalability of parallel …

저장 인용 2회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Encrypted all-reduce on multi-core clusters

M Gavahi, A Naser, C Wu, MS Lahijani… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

We consider the encrypted all-reduce operation on multi-core clusters. We derive
performance bounds for the encrypted all-reduce operation and develop efficient algorithms …

저장 인용 3회 인용 관련 학술자료 전체 3개의 버전

Shared memory based mpi broadcast algorithms for numa systems

M Kurnosov, E Tokmasheva - Russian Supercomputing Days, 2020 - Springer

MPI_Bcast collective communication operation is used by many scientific applications and
tend to limit overall parallel application scalability. This paper investigates the design and …

저장 인용 4회 인용 관련 학술자료 전체 4개의 버전

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

H Ahn, S Kim, Y Park, W Han, S Ahn… - … Conference on Big …, 2024 - ieeexplore.ieee.org

In Artificial Intelligence (AI) and high-performance computing (HPC), growing data and
model sizes require distributed processing across multiple nodes due to single-node …

저장 인용 관련 학술자료 전체 2개의 버전

PiP-MColl: Process-in-Process-based Multi-object MPI Collectives

J Huang, K Ouyang, Y Zhai, J Liu, M Si… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

In the era of exascale computing, the adoption of a large number of CPU cores and nodes
by high-performance computing (HPC) applications has made MPI collective performance …

저장 인용 관련 학술자료 전체 3개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Framework for scalable intra-node collective operations using shared memory

Fmi: Fast and cheap message passing for serverless functions

Optimizing mpi collectives on shared memory multi-cores

Node-aware improvements to allreduce

CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

A framework for hierarchical single-copy MPI collectives on multicore nodes

Encrypted all-reduce on multi-core clusters

Shared memory based mpi broadcast algorithms for numa systems

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

PiP-MColl: Process-in-Process-based Multi-object MPI Collectives