Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks

J Peng, Z Chen, Y Shao, Y Shen, L Chen… - Proceedings of the VLDB …, 2022 - dl.acm.org
Graph neural networks (GNNs) have emerged due to their success at modeling graph data.
Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs …

{HetPipe}: Enabling large {DNN} training on (whimpy) heterogeneous {GPU} clusters through integration of pipelined model parallelism and data parallelism

JH Park, G Yun, MY Chang, NT Nguyen, S Lee… - 2020 USENIX Annual …, 2020 - usenix.org
Deep Neural Network (DNN) models have continuously been growing in size in order to
improve the accuracy and quality of the models. Moreover, for training of large DNN models …

A comprehensive empirical study of heterogeneity in federated learning

AM Abdelmoniem, CY Ho… - IEEE Internet of …, 2023 - ieeexplore.ieee.org
Federated learning (FL) is becoming a popular paradigm for collaborative learning over
distributed, private data sets owned by nontrusting entities. FL has seen successful …

Federated neural collaborative filtering

V Perifanis, PS Efraimidis - Knowledge-Based Systems, 2022 - Elsevier
In this work, we present a federated version of the state-of-the-art Neural Collaborative
Filtering (NCF) approach for item recommendations. The system, named FedNCF, enables …

Refl: Resource-efficient federated learning

AM Abdelmoniem, AN Sahu, M Canini… - Proceedings of the …, 2023 - dl.acm.org
Federated Learning (FL) enables distributed training by learners using local data, thereby
enhancing privacy and reducing communication. However, it presents numerous challenges …

A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

Database meets deep learning: Challenges and opportunities

W Wang, M Zhang, G Chen, HV Jagadish, BC Ooi… - ACM Sigmod …, 2016 - dl.acm.org
Deep learning has recently become very popular on account of its incredible success in
many complex datadriven applications, including image classification and speech …