Heterogeneous federated learning: State-of-the-art and research challenges

M Ye, X Fang, B Du, PC Yuen, D Tao - ACM Computing Surveys, 2023‏ - dl.acm.org
Federated learning (FL) has drawn increasing attention owing to its potential use in large-
scale industrial applications. Existing FL works mainly focus on model homogeneous …

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021‏ - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Gaia:{Geo-Distributed} machine learning approaching {LAN} speeds

K Hsieh, A Harlap, N Vijaykumar, D Konomis… - … USENIX symposium on …, 2017‏ - usenix.org
Machine learning (ML) is widely used to derive useful information from large-scale data
(such as user activities, pictures, and videos) generated at increasingly rapid rates, all over …

Challenges, applications and design aspects of federated learning: A survey

KMJ Rahman, F Ahmed, N Akhter, M Hasan… - IEEe …, 2021‏ - ieeexplore.ieee.org
Federated learning (FL) is a new technology that has been a hot research topic. It enables
the training of an algorithm across multiple decentralized edge devices or servers holding …

Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters

H Zhang, Z Zheng, S Xu, W Dai, Q Ho, X Liang… - 2017 USENIX Annual …, 2017‏ - usenix.org
Deep learning models can take weeks to train on a single GPU-equipped machine,
necessitating scaling out DL training to a GPU-cluster. However, current distributed DL …

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020‏ - dl.acm.org
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

Petuum: A new platform for distributed machine learning on big data

EP **ng, Q Ho, W Dai, JK Kim, J Wei, S Lee… - Proceedings of the 21th …, 2015‏ - dl.acm.org
How can one build a distributed framework that allows efficient deployment of a wide
spectrum of modern advanced machine learning (ML) programs for industrial-scale …