- Academic Search

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

Enregistrer Citer Cité 261 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] yibozhu.com

A generic communication scheduler for distributed DNN training acceleration

Y Peng, Y Zhu, Y Chen, Y Bao, B Yi, C Lan… - Proceedings of the 27th …, 2019 - dl.acm.org

We present ByteScheduler, a generic communication scheduler for distributed DNN training
acceleration. ByteScheduler is based on our principled analysis that partitioning and …

Enregistrer Citer Cité 388 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Firecaffe: near-linear acceleration of deep neural network training on compute clusters

FN Iandola, MW Moskewicz… - Proceedings of the …, 2016 - openaccess.thecvf.com

Long training times for high-accuracy deep neural networks (DNNs) impede research into
new DNN architectures and slow the development of high-accuracy DNNs. In this paper we …

Enregistrer Citer Cité 402 fois Autres articles Les 12 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] acm.org

SiP-ML: high-bandwidth optical network interconnects for machine learning training

M Khani, M Ghobadi, M Alizadeh, Z Zhu… - Proceedings of the …, 2021 - dl.acm.org

This paper proposes optical network interconnects as a key enabler for building high-
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …

Enregistrer Citer Cité 97 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] usenix.org

{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters

S Rajasekaran, M Ghobadi, A Akella - 21st USENIX Symposium on …, 2024 - usenix.org

We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …

Enregistrer Citer Cité 18 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] usenix.org

{KungFu}: Making training in distributed machine learning adaptive

L Mai, G Li, M Wagenländer, K Fertakis… - … USENIX Symposium on …, 2020 - usenix.org

When using distributed machine learning (ML) systems to train models on a cluster of worker
machines, users must configure a large number of parameters: hyper-parameters (eg the …

Enregistrer Citer Cité 88 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] usenix.org

Ekko: A {Large-Scale} deep learning recommender system with {Low-Latency} model update

C Sima, Y Fu, MK Sit, L Guo, X Gong, F Lin… - … USENIX Symposium on …, 2022 - usenix.org

Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus
promptly serving new users and content. Existing DLRSs, however, fail to do so. They …

Enregistrer Citer Cité 37 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] hku.hk

Preemptive all-reduce scheduling for expediting distributed DNN training

Y Bao, Y Peng, Y Chen, C Wu - IEEE INFOCOM 2020-IEEE …, 2020 - ieeexplore.ieee.org

Data-parallel training is widely used for scaling DNN training over large datasets, using the
parameter server or all-reduce architecture. Communication scheduling has been promising …

Enregistrer Citer Cité 75 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

SmartPC: Hierarchical pace control in real-time federated learning system

L Li, H **ong, Z Guo, J Wang… - 2019 IEEE Real-Time …, 2019 - ieeexplore.ieee.org

Federated Learning is a technique for learning AI models through the collaboration of a
large number of resourceconstrained mobile devices, while preserving data privacy. Instead …

Enregistrer Citer Cité 81 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L **ao - Journal of Parallel and Distributed …, 2021 - Elsevier

Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

Enregistrer Citer Cité 64 fois Autres articles Les 4 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Optimizing network performance in distributed machine learning

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

A generic communication scheduler for distributed DNN training acceleration

Firecaffe: near-linear acceleration of deep neural network training on compute clusters

SiP-ML: high-bandwidth optical network interconnects for machine learning training

{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters

{KungFu}: Making training in distributed machine learning adaptive

Ekko: A {Large-Scale} deep learning recommender system with {Low-Latency} model update

Preemptive all-reduce scheduling for expediting distributed DNN training

SmartPC: Hierarchical pace control in real-time federated learning system

Communication optimization strategies for distributed deep neural network training: A survey