- Academic Search

JF Torres, A Galicia, A Troncoso… - Integrated Computer …, 2018 - content.iospress.com

This paper presents a method based on deep learning to deal with big data times series
forecasting. The deep feed forward neural network provided by the H2O big data analysis …

Save Cite Cited by 142 Related articles All 3 versions Free GPT-4 Full View

Shuffle net: An application of generalized perfect shuffles to multihop lightwave networks

MG Hluchyj, MJ Karol - Journal of Lightwave Technology, 1991 - ieeexplore.ieee.org

A multihop wavelength-division multiplexing (WDM) approach, referred to as Shuffle Net, for
achieving concurrency in distributed lightwave networks is proposed. A Shuffle Net can be …

Save Cite Cited by 495 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}

Z Wang, H Huang, J Zhang, F Wu… - 2022 USENIX Annual …, 2022 - usenix.org

Given that the increasing rate of network bandwidth is far ahead of that of the compute
capacity of host CPU, which by default processes network packets, SmartNIC has been …

Save Cite Cited by 37 Related articles All 7 versions Free GPT-4 View as HTML

GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation

H Wang, S Potluri, D Bureddy… - … on Parallel and …, 2013 - ieeexplore.ieee.org

Designing high-performance and scalable applications on GPU clusters requires tackling
several challenges. The key challenge is the separate host memory and device memory …

Save Cite Cited by 124 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning

AA Awan, K Hamidouche, A Venkatesh… - Proceedings of the 23rd …, 2016 - dl.acm.org

Emerging paradigms like High Performance Data Analytics (HPDA) and Deep Learning (DL)
pose at least two new design challenges for existing MPI runtimes. First, these paradigms …

Save Cite Cited by 60 Related articles All 4 versions Free GPT-4

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters

R Shi, S Potluri, K Hamidouche… - … Conference on High …, 2014 - ieeexplore.ieee.org

Increasing number of MPI applications are being ported to take advantage of the compute
power offered by GPUs. Data movement on GPU clusters continues to be the major …

Save Cite Cited by 59 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems

KS Khorassani, CH Chu, QG Anthony… - 2021 IEEE/ACM 21st …, 2021 - ieeexplore.ieee.org

In recent years, GPU-enhanced clusters have become more prevalent in High-Performance
Computing (HPC), leading to a demand for more efficient multi-GPU communication. This …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects

D De Sensi, L Pichetti, F Vella… - … Conference for High …, 2024 - ieeexplore.ieee.org

Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale
supercomputers. On these systems, GPUs on the same node are connected through …

Save Cite Cited by 1 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Performance evaluation of MPI libraries on GPU-enabled OpenPOWER architectures: Early experiences

KS Khorassani, CH Chu, H Subramoni… - … Computing: ISC High …, 2019 - Springer

Abstract The advent of Graphics Processing Unit (GPU)-enabled OpenPOWER architectures
are empowering the advancement of various High-Performance Computing (HPC) …

Save Cite Cited by 28 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Efficient process arrival pattern aware collective communication for deep learning

P Alizadeh, A Sojoodi, Y Hassan Temucin… - Proceedings of the 29th …, 2022 - dl.acm.org

MPI collective communication operations are used extensively in parallel applications. As
such, researchers have been investigating how to improve their performance and scalability …

Save Cite Cited by 9 Related articles All 4 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Omb-gpu: A micro-benchmark suite for evaluating mpi libraries on gpu clusters

A scalable approach based on deep learning for big data time series forecasting

Shuffle net: An application of generalized perfect shuffles to multihop lightwave networks

{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}

GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation

Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters

Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects

Performance evaluation of MPI libraries on GPU-enabled OpenPOWER architectures: Early experiences

Efficient process arrival pattern aware collective communication for deep learning