- Academic Search

D De Sensi, S Di Girolamo… - … Conference for High …, 2020‏ - ieeexplore.ieee.org‏

The interconnect is one of the most critical components in large scale computing systems,
and its impact on the performance of applications is going to increase with the system size …‏

שמור צטט צוטט על ידי 147 מאמרים בנושא זה כל 35 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A large-scale study of MPI usage in open-source HPC applications‏

I Laguna, R Marshall, K Mohror, M Ruefenacht… - Proceedings of the …, 2019‏ - dl.acm.org‏

Understanding the state-of-the-practice in MPI usage is paramount for many aspects of
supercomputing, including optimizing the communication of HPC applications and informing …‏

שמור צטט צוטט על ידי 108 מאמרים בנושא זה כל 3 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Flare: Flexible in-network allreduce‏

D De Sensi, S Di Girolamo, S Ashkboos, S Li… - Proceedings of the …, 2021‏ - dl.acm.org‏

The allreduce operation is one of the most commonly used communication routines in
distributed applications. To improve its bandwidth and to reduce network traffic, this …‏

שמור צטט צוטט על ידי 51 מאמרים בנושא זה כל 27 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices‏

SU Noh, J Hong, C Lim, S Park, J Kim… - 2024 ACM/IEEE 51st …, 2024‏ - ieeexplore.ieee.org‏

Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory
(PIM) by associating their memory banks with processing elements (PEs), allowing …‏

שמור צטט צוטט על ידי 3 מאמרים בנושא זה כל 6 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Near-optimal wafer-scale reduce‏

P Luczynski, L Gianinazzi, P Iff, L Wilson… - Proceedings of the 33rd …, 2024‏ - dl.acm.org‏

Efficient Reduce and AllReduce communication collectives are a critical cornerstone of high-
performance computing (HPC) applications. We present the first systematic investigation of …‏

שמור צטט צוטט על ידי 3 מאמרים בנושא זה כל 21 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

gzccl: Compression-accelerated collective communication framework for gpu clusters‏

J Huang, S Di, X Yu, Y Zhai, J Liu, Y Huang… - Proceedings of the 38th …, 2024‏ - dl.acm.org‏

GPU-aware collective communication has become a major bottleneck for modern computing
platforms as GPU computing power rapidly rises. A traditional approach is to directly …‏

שמור צטט צוטט על ידי 8 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

Understanding the use of message passing interface in exascale proxy applications‏

N Sultana, M Rüfenacht, A Skjellum… - Concurrency and …, 2021‏ - Wiley Online Library‏

Summary The Exascale Computing Project (ECP) focuses on the development of future
exascale‐capable applications. Most ECP applications use the message passing interface …‏

שמור צטט צוטט על ידי 34 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

RAMP: a flat nanosecond optical network and MPI operations for distributed deep learning systems‏

A Ottino, J Benjamin, G Zervas - Optical Switching and Networking, 2024‏ - Elsevier‏

Distributed deep learning (DDL) systems strongly depend on network performance. Current
electronic packet switched (EPS) network architectures and technologies suffer from …‏

שמור צטט צוטט על ידי 14 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Characterization and identification of HPC applications at leadership computing facility‏

Z Liu, R Lewis, R Kettimuthu, K Harms… - Proceedings of the 34th …, 2020‏ - dl.acm.org‏

High Performance Computing (HPC) is an important method for scientific discovery via large-
scale simulation, data analysis, or artificial intelligence. Leadership-class supercomputers …‏

שמור צטט צוטט על ידי 31 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Swing: Short-cutting rings for higher bandwidth allreduce‏

D De Sensi, T Bonato, D Saam, T Hoefler - 21st USENIX Symposium on …, 2024‏ - usenix.org‏

The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 23 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Characterization of MPI usage on a production supercomputer

An in-depth analysis of the slingshot interconnect‏

A large-scale study of MPI usage in open-source HPC applications‏

Flare: Flexible in-network allreduce‏

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices‏

Near-optimal wafer-scale reduce‏

gzccl: Compression-accelerated collective communication framework for gpu clusters‏

Understanding the use of message passing interface in exascale proxy applications‏

RAMP: a flat nanosecond optical network and MPI operations for distributed deep learning systems‏

Characterization and identification of HPC applications at leadership computing facility‏

Swing: Short-cutting rings for higher bandwidth allreduce‏