Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arxiv preprint arxiv:2003.06307, 2020‏ - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning

L Zhang, L Zhang, S Shi, X Chu, B Li - arxiv preprint arxiv:2308.03303, 2023‏ - arxiv.org
The low-rank adaptation (LoRA) method can largely reduce the amount of trainable
parameters for fine-tuning large language models (LLMs), however, it still requires …

Fusionai: Decentralized training and deploying llms with massive consumer-level gpus

Z Tang, Y Wang, X He, L Zhang, X Pan, Q Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …

[HTML][HTML] Distributed Learning in Intelligent Transportation Systems: A Survey

Q Li, W Zhou, X Zheng - Information, 2024‏ - mdpi.com
The development of artificial intelligence (AI) and self-driving technology is expected to
enhance intelligent transportation systems (ITSs) by improving road safety and mobility …

Fusionllm: A decentralized llm training system on geo-distributed gpus with adaptive compression

Z Tang, X Kang, Y Yin, X Pan, Y Wang, X He… - arxiv preprint arxiv …, 2024‏ - arxiv.org
To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly
large language models (LLMs), we present FusionLLM, a decentralized training system …

Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning

J Peng, Z Li, S Shi, B Li - … of the 53rd International Conference on …, 2024‏ - dl.acm.org
Synchronous stochastic gradient descent (S-SGD) with data parallelism has become a de-
facto approach in training large-scale deep neural networks (DNNs) on multi-GPU systems …

Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

D Yoon, S Oh - 2024 IEEE 24th International Symposium on …, 2024‏ - ieeexplore.ieee.org
Communication overhead is a major obstacle to scaling distributed training systems.
Gradient sparsification is a potential optimization approach to reduce the communication …

Near-Lossless Gradient Compression for Data-Parallel Distributed DNN Training

X Li, C Guo, K Qian, M Zhang, M Yang… - Proceedings of the 2024 …, 2024‏ - dl.acm.org
Data parallelism has become a cornerstone in scaling up the training of deep neural
networks (DNNs). However, the communication overhead associated with synchronizing …

FedSSA: Reducing Overhead of Additive Cryptographic Methods in Federated Learning With Sketch

Z Ou, S Han, Q Zeng, Q Huang - 2024 IEEE 32nd International …, 2024‏ - ieeexplore.ieee.org
Federated Learning (FL) has been applied across diverse domains as a powerful technique
but faces critical challenges in privacy protection. Secure aggregation and additive …

Efficient Federated Learning Via Low-Rank Gradient Compression for Intelligent Transportation System

Q Li, X Ma, T **ao, Y Zhu, R Cai - 2024 Cross Strait Radio …, 2024‏ - ieeexplore.ieee.org
Within the realm of intelligent transportation systems, be it for autonomous vehicles or other
applications, the window of opportunity for executing distributed learning is constrained …