Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arxiv preprint arxiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Sharper convergence guarantees for asynchronous SGD for distributed and federated learning

A Koloskova, SU Stich, M Jaggi - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the asynchronous stochastic gradient descent algorithm, for distributed training
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …

Stochastic gradient descent under Markovian sampling schemes

M Even - International Conference on Machine Learning, 2023 - proceedings.mlr.press
We study a variation of vanilla stochastic gradient descent where the optimizer only has
access to a Markovian sampling scheme. These schemes encompass applications that …

Fusionai: Decentralized training and deploying llms with massive consumer-level gpus

Z Tang, Y Wang, X He, L Zhang, X Pan, Q Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

AsGrad: A sharp unified analysis of asynchronous-SGD algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

Optimal time complexities of parallel stochastic optimization methods under a fixed computation model

A Tyurin, P Richtárik - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Parallelization is a popular strategy for improving the performance of methods. Optimization
methods are no exception: design of efficient parallel optimization methods and tight …

Asynchronous federated reinforcement learning with policy gradient updates: Algorithm design and convergence analysis

G Lan, DJ Han, A Hashemi, V Aggarwal… - arxiv preprint arxiv …, 2024 - arxiv.org
To improve the efficiency of reinforcement learning, we propose a novel asynchronous
federated reinforcement learning framework termed AFedPG, which constructs a global …

Queuing dynamics of asynchronous Federated Learning

L Leconte, M Jonckheere… - International …, 2024 - proceedings.mlr.press
We study asynchronous federated learning mechanisms with nodes having potentially
different computational speeds. In such an environment, each node is allowed to work on …

CO2: Efficient distributed training with full communication-computation overlap

W Sun, Z Qin, W Sun, S Li, D Li, X Shen, Y Qiao… - arxiv preprint arxiv …, 2024 - arxiv.org
The fundamental success of large language models hinges upon the efficacious
implementation of large-scale distributed training techniques. Nevertheless, building a vast …