Google Akademik

Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction

RL Graham, D Bureddy, P Lui… - … in HPC (COMHPC), 2016 - ieeexplore.ieee.org

Increased system size and a greater reliance on utilizing system parallelism to achieve
computational needs, requires innovative system architectures to meet the simulation …

Kaydet Alıntı yap Alıntılanma sayısı: 169 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

An evaluation of the CORAL interconnects

C Zimmer, S Atchley, R Pankajakshan… - Proceedings of the …, 2019 - dl.acm.org

The US Department of Energy deployed the Summit and Sierra supercomputers with the
latest state-of-the-art network interconnect technology in 2018 and both systems entered …

Kaydet Alıntı yap Alıntılanma sayısı: 30 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Hierarchical redesign of classic MPI reduction algorithms

K Hasanov, A Lastovetsky - The Journal of Supercomputing, 2017 - Springer

Optimization of MPI collective communication operations has been an active research topic
since the advent of MPI in 1990s. Many general and architecture-specific collective …

Kaydet Alıntı yap Alıntılanma sayısı: 39 İlgili makaleler 8 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Efficient process arrival pattern aware collective communication for deep learning

P Alizadeh, A Sojoodi, Y Hassan Temucin… - Proceedings of the 29th …, 2022 - dl.acm.org

MPI collective communication operations are used extensively in parallel applications. As
such, researchers have been investigating how to improve their performance and scalability …

Kaydet Alıntı yap Alıntılanma sayısı: 9 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tascade: Hardware support for atomic-free, asynchronous and efficient reduction trees

M Orenes-Vera, E Tureci, D Wentzlaff… - arxiv preprint arxiv …, 2023 - arxiv.org

Graph search and sparse data-structure traversal workloads contain challenging irregular
memory patterns on global data structures that need to be modified atomically. Distributed …

Kaydet Alıntı yap Alıntılanma sayısı: 2 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Energy-efficient collective reduce and allreduce operations on distributed GPUs

L Oden, B Klenk, H Fröning - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org

GPUs gain high popularity in High Performance Computing, due to their massive parallelism
and high performance per Watt. Despite their popularity, data transfer between multiple …

Kaydet Alıntı yap Alıntılanma sayısı: 30 İlgili makaleler 5 sürümün hepsi

Unified collective communication (ucc): An unified library for cpu, gpu, and dpu collectives

MG Venkata, V Petrov, S Lebedev… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org

Unified Collective Communication (UCC) is an API and library implementation of collective
communication operations. The goal of UCC is to provide a unified API and library serving …

Kaydet Alıntı yap Alıntılanma sayısı: 1 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] russianscdays.org

Designing a Parallel Programs on the Base of the Conception of Q-Determinant

V Aleeva - … : 4th Russian Supercomputing Days, RuSCDays 2018 …, 2019 - Springer

The paper describes a design method of parallel programs for numerical algorithms based
on their representation in the form of Q-determinant. The result of the method is Q-effective …

Kaydet Alıntı yap Alıntılanma sayısı: 14 İlgili makaleler 5 sürümün hepsi

High-Performance Computing Using Application of Q-determinant of Numerical Algorithms

VN Aleeva, RZ Aleev - 2018 Global Smart Industry Conference …, 2018 - ieeexplore.ieee.org

The conception of Q-determinant is one of the approaches to parallelizing numerical
algorithms. The basic notion of the conception is Q-determinant of the algorithm. Here Q is …

Kaydet Alıntı yap Alıntılanma sayısı: 10 İlgili makaleler 2 sürümün hepsi

Unified Collective Communication (UCC): A Unified Library for CPU, GPU, and DPU Collectives

M GorentlaVenkata, V Petrov, S Lebedev… - IEEE Micro, 2025 - ieeexplore.ieee.org

Unified Collective Communication (UCC) is an API and library implementation of collective
communication operations. The goal of UCC is to provide a unified API and library serving …

Kaydet Alıntı yap İlgili makaleler 2 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical...

Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction

An evaluation of the CORAL interconnects

Hierarchical redesign of classic MPI reduction algorithms

Efficient process arrival pattern aware collective communication for deep learning

Tascade: Hardware support for atomic-free, asynchronous and efficient reduction trees

Energy-efficient collective reduce and allreduce operations on distributed GPUs

Unified collective communication (ucc): An unified library for cpu, gpu, and dpu collectives

Designing a Parallel Programs on the Base of the Conception of Q-Determinant

High-Performance Computing Using Application of Q-determinant of Numerical Algorithms

Unified Collective Communication (UCC): A Unified Library for CPU, GPU, and DPU Collectives