Communication-efficient distributed deep learning: A comprehensive survey
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
Sharper convergence guarantees for asynchronous SGD for distributed and federated learning
We study the asynchronous stochastic gradient descent algorithm, for distributed training
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …
Stochastic gradient descent under Markovian sampling schemes
M Even - International Conference on Machine Learning, 2023 - proceedings.mlr.press
We study a variation of vanilla stochastic gradient descent where the optimizer only has
access to a Markovian sampling scheme. These schemes encompass applications that …
access to a Markovian sampling scheme. These schemes encompass applications that …
Fusionai: Decentralized training and deploying llms with massive consumer-level gpus
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …
AsGrad: A sharp unified analysis of asynchronous-SGD algorithms
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …
where each worker has its own computation and communication speeds, as well as data …
Optimal time complexities of parallel stochastic optimization methods under a fixed computation model
Parallelization is a popular strategy for improving the performance of methods. Optimization
methods are no exception: design of efficient parallel optimization methods and tight …
methods are no exception: design of efficient parallel optimization methods and tight …
Asynchronous federated reinforcement learning with policy gradient updates: Algorithm design and convergence analysis
To improve the efficiency of reinforcement learning, we propose a novel asynchronous
federated reinforcement learning framework termed AFedPG, which constructs a global …
federated reinforcement learning framework termed AFedPG, which constructs a global …
Queuing dynamics of asynchronous Federated Learning
We study asynchronous federated learning mechanisms with nodes having potentially
different computational speeds. In such an environment, each node is allowed to work on …
different computational speeds. In such an environment, each node is allowed to work on …
CO2: Efficient distributed training with full communication-computation overlap
The fundamental success of large language models hinges upon the efficacious
implementation of large-scale distributed training techniques. Nevertheless, building a vast …
implementation of large-scale distributed training techniques. Nevertheless, building a vast …