- Academic Search

Z Tang, S Shi, W Wang, B Li, X Chu - arxiv preprint arxiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Speichern Zitieren Zitiert von: 158 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Deep gradient compression: Reducing the communication bandwidth for distributed training

Y Lin, S Han, H Mao, Y Wang, WJ Dally - arxiv preprint arxiv:1712.01887, 2017 - arxiv.org

Large-scale distributed training requires significant communication bandwidth for gradient
exchange that limits the scalability of multi-node training, and requires expensive high …

Speichern Zitieren Zitiert von: 1699 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] mlsys.org

Tutel: Adaptive mixture-of-experts at scale

C Hwang, W Cui, Y **ong, Z Yang… - Proceedings of …, 2023 - proceedings.mlsys.org

Sparsely-gated mixture-of-experts (MoE) has been widely adopted to scale deep learning
models to trillion-plus parameters with fixed computational cost. The algorithmic …

Speichern Zitieren Zitiert von: 80 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] mit.edu

The pyramid match kernel: Discriminative classification with sets of image features

K Grauman, T Darrell - … on Computer Vision (ICCV'05) Volume …, 2005 - ieeexplore.ieee.org

Discriminative learning is challenging when examples are sets of features, and the sets vary
in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods …

Speichern Zitieren Zitiert von: 2203 Ähnliche Artikel Alle 30 Versionen

[Free GPT-4]

[PDF] psu.edu

Optimization of collective communication operations in MPICH

R Thakur, R Rabenseifner… - The International Journal …, 2005 - journals.sagepub.com

We describe our work on improving the performance of collective communication operations
in MPICH for clusters connected by switched networks. For each collective operation, we …

Speichern Zitieren Zitiert von: 1228 Ähnliche Artikel Alle 15 Versionen

The analysis of a plane wave pseudopotential density functional theory code on a GPU machine

W Jia, Z Cao, L Wang, J Fu, X Chi, W Gao… - Computer Physics …, 2013 - Elsevier

Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most
widely used material science simulation, and the PWP DFT codes are arguably the most …

Speichern Zitieren Zitiert von: 405 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] psu.edu

[BUCH][B] High performance visualization: Enabling extreme-scale scientific insight

EW Bethel, H Childs, C Hansen - 2012 - books.google.com

Visualization and analysis tools, techniques, and algorithms have undergone a rapid
evolution in recent decades to accommodate explosive growth in data size and complexity …

Speichern Zitieren Zitiert von: 319 Ähnliche Artikel Alle 8 Versionen Bibliothekssuche

[Free GPT-4]

[PDF] psu.edu

Performance analysis of MPI collective operations

J Pješivac-Grbović, T Angskun, G Bosilca, GE Fagg… - Cluster …, 2007 - Springer

Previous studies of application usage show that the performance of collective
communications are critical for high-performance computing. Despite active research in the …

Speichern Zitieren Zitiert von: 427 Ähnliche Artikel Alle 33 Versionen

[Free GPT-4]

[PDF] hlrs.de

Optimization of collective reduction operations

R Rabenseifner - Computational Science-ICCS 2004: 4th International …, 2004 - Springer

A 5-year-profiling in production mode at the University of Stuttgart has shown that more than
40% of the execution time of Message Passing Interface (MPI) routines is spent in the …

Speichern Zitieren Zitiert von: 270 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] arxiv.org

A unified coded deep neural network training strategy based on generalized polydot codes

S Dutta, Z Bai, H Jeong, TM Low… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

This paper has two main contributions. First, we propose a novel coding technique-
Generalized PolyDot-for matrix-vector products that advances on existing techniques for …

Speichern Zitieren Zitiert von: 127 Ähnliche Artikel Alle 6 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Efficient algorithms for all-to-all communications in multi-port message-passing systems

Communication-efficient distributed deep learning: A comprehensive survey

Deep gradient compression: Reducing the communication bandwidth for distributed training

Tutel: Adaptive mixture-of-experts at scale

The pyramid match kernel: Discriminative classification with sets of image features

Optimization of collective communication operations in MPICH

The analysis of a plane wave pseudopotential density functional theory code on a GPU machine

[BUCH][B] High performance visualization: Enabling extreme-scale scientific insight

Performance analysis of MPI collective operations

Optimization of collective reduction operations

A unified coded deep neural network training strategy based on generalized polydot codes