Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

Feddc: Federated learning with non-iid data via local drift decoupling and correction

L Gao, H Fu, L Li, Y Chen, M Xu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Federated learning (FL) allows multiple clients to collectively train a high-performance
global model without sharing their private data. However, the key challenge in federated …

See through gradients: Image batch recovery via gradinversion

H Yin, A Mallya, A Vahdat, JM Alvarez… - Proceedings of the …, 2021 - openaccess.thecvf.com
Training deep neural networks requires gradient estimation from data batches to update
parameters. Gradients per parameter are averaged over a set of data and this has been …

Cafe: Catastrophic data leakage in vertical federated learning

X **, PY Chen, CY Hsu, CM Yu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recent studies show that private training data can be leaked through the gradients sharing
mechanism deployed in distributed machine learning systems, such as federated learning …

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press
Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

Deep leakage from gradients

L Zhu, Z Liu, S Han - Advances in neural information …, 2019 - proceedings.neurips.cc
Passing gradient is a widely used scheme in modern multi-node learning system (eg,
distributed training, collaborative learning). In a long time, people used to believe that …

Towards efficient and scalable sharpness-aware minimization

Y Liu, S Mai, X Chen, CJ Hsieh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated a significant performance boost …

Large batch optimization for deep learning: Training bert in 76 minutes

Y You, J Li, S Reddi, J Hseu, S Kumar… - arxiv preprint arxiv …, 2019 - arxiv.org
Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org
Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …