The evolution of distributed systems for graph neural networks and their origin in graph processing and deep learning: A survey

J Vatter, R Mayer, HA Jacobsen - ACM Computing Surveys, 2023 - dl.acm.org
Graph neural networks (GNNs) are an emerging research field. This specialized deep
neural network architecture is capable of processing graph structured data and bridges the …

Efficient sparse collective communication and its application to accelerate distributed deep learning

J Fei, CY Ho, AN Sahu, M Canini, A Sapio - Proceedings of the 2021 …, 2021 - dl.acm.org
Efficient collective communication is crucial to parallel-computing applications such as
distributed training of large-scale recommendation systems and natural language …

Unlocking the power of inline {Floating-Point} operations on programmable switches

Y Yuan, O Alama, J Fei, J Nelson, DRK Ports… - … USENIX Symposium on …, 2022 - usenix.org
The advent of switches with programmable dataplanes has enabled the rapid development
of new network functionality, as well as providing a platform for acceleration of a broad …

[HTML][HTML] Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture

N Janbi, I Katib, R Mehmood - Intelligent Systems with Applications, 2023 - Elsevier
Artificial intelligence (AI) research and market have grown rapidly in the last few years, and
this trend is expected to continue with many potential advancements and innovations in this …

Time-correlated sparsification for communication-efficient federated learning

E Ozfatura, K Ozfatura, D Gündüz - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Federated learning (FL) enables multiple clients to collaboratively train a shared model, with
the help of a parameter server (PS), without disclosing their local datasets. However, due to …

Gemini: Fast failure recovery in distributed training with in-memory checkpoints

Z Wang, Z Jia, S Zheng, Z Zhang, X Fu… - Proceedings of the 29th …, 2023 - dl.acm.org
Large deep learning models have recently garnered substantial attention from both
academia and industry. Nonetheless, frequent failures are observed during large model …

Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024 - dl.acm.org
GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …

{PipeSwitch}: Fast pipelined context switching for deep learning applications

Z Bai, Z Zhang, Y Zhu, X ** - 14th USENIX Symposium on Operating …, 2020 - usenix.org
Deep learning (DL) workloads include throughput-intensive training tasks and latency-
sensitive inference tasks. The dominant practice today is to provision dedicated GPU …

[HTML][HTML] Dynamic and adaptive fault-tolerant asynchronous federated learning using volunteer edge devices

JÁ Morell, E Alba - Future Generation Computer Systems, 2022 - Elsevier
The number of devices, from smartphones to IoT hardware, interconnected via the Internet is
growing all the time. These devices produce a large amount of data that cannot be analyzed …

On the utility of gradient compression in distributed training systems

S Agarwal, H Wang, S Venkataraman… - Proceedings of …, 2022 - proceedings.mlsys.org
A rich body of prior work has highlighted the existence of communication bottlenecks in
synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent …