- Academic Search

{TopoOpt}: Co-optimizing network topology and parallelization strategy for distributed training jobs

W Wang, M Khazraee, Z Zhong, M Ghobadi… - … USENIX Symposium on …, 2023 - usenix.org

We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training
workloads. TopoOpt co-optimizes the distributed training process across three dimensions …

Gem Citer Citeret af 77 Relaterede artikler Alle 12 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L **ao - Journal of parallel and distributed …, 2021 - Elsevier

Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

Gem Citer Citeret af 65 Relaterede artikler Alle 4 versioner

Communication optimization algorithms for distributed deep learning systems: A survey

E Yu, D Dong, X Liao - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

Deep learning's widespread adoption in various fields has made distributed training across
multiple computing nodes essential. However, frequent communication between nodes can …

Gem Citer Citeret af 7 Relaterede artikler Alle 3 versioner

Communication-efficient ADMM-based distributed algorithms for sparse training

G Wang, Y Lei, Y Qiu, L Lou, Y Li - Neurocomputing, 2023 - Elsevier

In large-scale distributed machine learning (DML), the synchronization efficiency of the
distributed algorithm becomes a critical factor that affects the training time of machine …

Gem Citer Citeret af 3 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training

H Wu, S Wang, Y Bai, C Li, Q Zhou, J Yi… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Gradient compression is a promising approach to alleviating the communication bottleneck
in data parallel deep neural network (DNN) training by significantly reducing the data …

Gem Citer Citeret af 2 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] google.com

HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication

D Wang, Y Lei, J **e, G Wang - The Journal of Supercomputing, 2021 - Springer

The distributed alternating direction method of multipliers (ADMM) is an effective algorithm
for solving large-scale optimization problems. However, its high communication cost limits its …

Gem Citer Citeret af 8 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] uta.edu

COFFEE: Cross-Layer Optimization for Fast and Efficient Executions of Sinkhorn-Knopp Algorithm on HPC Systems

C Sun, H Luo, H Jiang, J Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In this paper, we present COFFEE, cross-layer optimization for fast and efficient executions
of the Sinkhorn-Knopp (SK) algorithm on HPC systems with clusters of compute nodes by …

Gem Citer Citeret af 1 Relaterede artikler Alle 6 versioner

Modeling and Simulation of Collective Algorithms on HPC Network Topologies using Structural Simulation Toolkit

SP Chenna, M Steyer, N Kumar… - SC24-W: Workshops …, 2024 - ieeexplore.ieee.org

In the last decade, DL training has emerged as an HPC-scale workload running on large
clusters, the size of the largest supercomputers on the Top500 list. The dominant …

Gem Citer Relaterede artikler Alle 3 versioner

Error Permissive Computing: for Post Moore's Computer System Design

R Takano, T Hirofuchi, M Wahib, TT Nguyen… - error-permissive-computing.github.io

We are exploring a new concept of error permissive computing that improves the capability
and capacity while drastically reducing power consumption. More specifically, we …

Gem Citer Relaterede artikler Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Topology-aware sparse allreduce for large-scale deep learning

{TopoOpt}: Co-optimizing network topology and parallelization strategy for distributed training jobs

Communication optimization strategies for distributed deep neural network training: A survey

Communication optimization algorithms for distributed deep learning systems: A survey

Communication-efficient ADMM-based distributed algorithms for sparse training

A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training

HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication

COFFEE: Cross-Layer Optimization for Fast and Efficient Executions of Sinkhorn-Knopp Algorithm on HPC Systems

Modeling and Simulation of Collective Algorithms on HPC Network Topologies using Structural Simulation Toolkit

Error Permissive Computing: for Post Moore's Computer System Design