محقق Google

Z Tang, S Shi, W Wang, B Li, X Chu - arxiv preprint arxiv:2003.06307, 2020‏ - arxiv.org‏

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …‏

ذخیره ارجاع بیان شده در 162 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pytorch distributed: Experiences on accelerating data parallel training‏

S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis… - arxiv preprint arxiv …, 2020‏ - arxiv.org‏

This paper presents the design, implementation, and evaluation of the PyTorch distributed
data parallel module. PyTorch is a widely-adopted scientific computing package used in …‏

ذخیره ارجاع بیان شده در 703 یافته مقاله‌های مربوط تمام نسخه‌های 13 Web of Science: 137‏ نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters‏

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020‏ - usenix.org‏

Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …‏

ذخیره ارجاع بیان شده در 346 یافته مقاله‌های مربوط تمام نسخه‌های 11 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Transparent {GPU} sharing in container clouds for deep learning workloads‏

B Wu, Z Zhang, Z Bai, X Liu, X ** - 20th USENIX Symposium on …, 2023‏ - usenix.org‏

Containers are widely used for resource management in datacenters. A common practice to
support deep learning (DL) training in container clouds is to statically bind GPUs to …‏

ذخیره ارجاع بیان شده در 45 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{TopoOpt}: Co-optimizing network topology and parallelization strategy for distributed training jobs‏

W Wang, M Khazraee, Z Zhong, M Ghobadi… - … USENIX Symposium on …, 2023‏ - usenix.org‏

We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training
workloads. TopoOpt co-optimizes the distributed training process across three dimensions …‏

ذخیره ارجاع بیان شده در 77 یافته مقاله‌های مربوط تمام نسخه‌های 12 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero++: Extremely efficient collective communication for giant model training‏

G Wang, H Qin, SA Jacobs, C Holmes… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language
models on massive GPUs clusters due to its ease of use, efficiency, and good scalability …‏

ذخیره ارجاع بیان شده در 46 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Accelerating distributed {MoE} training and inference with lina‏

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023‏ - usenix.org‏

Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …‏

ذخیره ارجاع بیان شده در 43 یافته مقاله‌های مربوط تمام نسخه‌های 11 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Communication-efficient large-scale distributed deep learning: A comprehensive survey‏

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Efficient sparse collective communication and its application to accelerate distributed deep learning‏

J Fei, CY Ho, AN Sahu, M Canini, A Sapio - Proceedings of the 2021 …, 2021‏ - dl.acm.org‏

Efficient collective communication is crucial to parallel-computing applications such as
distributed training of large-scale recommendation systems and natural language …‏

ذخیره ارجاع بیان شده در 104 یافته مقاله‌های مربوط تمام نسخه‌های 9 Find this at the Library‏

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

On optimizing the communication of model parallelism‏

Y Zhuang, L Zheng, Z Li, E **ng, Q Ho… - Proceedings of …, 2023‏ - proceedings.mlsys.org‏

We study a novel and important communication pattern in large-scale model-parallel deep
learning (DL), which we call cross-mesh resharding. This pattern emerges when the two …‏

ذخیره ارجاع بیان شده در 36 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Blink: Fast and generic collectives for distributed ml

Communication-efficient distributed deep learning: A comprehensive survey‏

Pytorch distributed: Experiences on accelerating data parallel training‏

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters‏

Transparent {GPU} sharing in container clouds for deep learning workloads‏

{TopoOpt}: Co-optimizing network topology and parallelization strategy for distributed training jobs‏

Zero++: Extremely efficient collective communication for giant model training‏

Accelerating distributed {MoE} training and inference with lina‏

Communication-efficient large-scale distributed deep learning: A comprehensive survey‏

Efficient sparse collective communication and its application to accelerate distributed deep learning‏

On optimizing the communication of model parallelism‏