- Academic Search

K O'brien, I Pietri, R Reddy, A Lastovetsky… - ACM Computing …, 2017 - dl.acm.org

Power and energy efficiency are now critical concerns in extreme-scale high-performance
scientific computing. Many extreme-scale computing systems today (for example: Top500) …

Opslaan Citeren Geciteerd door 116 Verwante artikelen Alle 7 versies

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Performance analysis of MPI collective operations

J Pješivac-Grbović, T Angskun, G Bosilca, GE Fagg… - Cluster …, 2007 - Springer

Previous studies of application usage show that the performance of collective
communications are critical for high-performance computing. Despite active research in the …

Opslaan Citeren Geciteerd door 427 Verwante artikelen Alle 33 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SparCML: High-performance sparse communication for machine learning

C Renggli, S Ashkboos, M Aghagolzadeh… - Proceedings of the …, 2019 - dl.acm.org

Applying machine learning techniques to the quickly growing data in science and industry
requires highly-scalable algorithms. Large datasets are most commonly processed" data …

Opslaan Citeren Geciteerd door 154 Verwante artikelen Alle 22 versies

[Free GPT-4]
[DeepSeek]

[PDF] hlrs.de

Optimization of collective reduction operations

R Rabenseifner - Computational Science-ICCS 2004: 4th International …, 2004 - Springer

A 5-year-profiling in production mode at the University of Stuttgart has shown that more than
40% of the execution time of Message Passing Interface (MPI) routines is spent in the …

Opslaan Citeren Geciteerd door 270 Verwante artikelen Alle 10 versies

[Free GPT-4]
[DeepSeek]

[PDF] utexas.edu

Parallel matrix factorization for recommender systems

HF Yu, CJ Hsieh, S Si, IS Dhillon - Knowledge and Information Systems, 2014 - Springer

Matrix factorization, when the matrix has missing values, has become one of the leading
techniques for recommender systems. To handle web-scale datasets with millions of users …

Opslaan Citeren Geciteerd door 181 Verwante artikelen Alle 12 versies

[Free GPT-4]
[DeepSeek]

[PDF] brown.edu

Optimization of MPI collective communication on BlueGene/L systems

G Almási, P Heidelberger, CJ Archer… - Proceedings of the 19th …, 2005 - dl.acm.org

BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of
low power dual-processor compute nodes interconnected by high speed torus and collective …

Opslaan Citeren Geciteerd door 252 Verwante artikelen Alle 12 versies

Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance

KK Sharma, A Seal - Expert Systems with Applications, 2021 - Elsevier

The spectral clustering algorithm is extensively employed in different aspects, especially in
the field of pattern recognition. However, the efficient construction of the neighborhood …

Opslaan Citeren Geciteerd door 53 Verwante artikelen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dear: Accelerating distributed deep learning with fine-grained all-reduce pipelining

L Zhang, S Shi, X Chu, W Wang, B Li… - 2023 IEEE 43rd …, 2023 - ieeexplore.ieee.org

Communication scheduling has been shown to be effective in accelerating distributed
training, which enables all-reduce communications to be overlapped with backpropagation …

Opslaan Citeren Geciteerd door 15 Verwante artikelen Alle 8 versies

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

NUMA-aware shared-memory collective communication for MPI

S Li, T Hoefler, M Snir - … of the 22nd international symposium on High …, 2013 - dl.acm.org

As the number of cores per node keeps increasing, it becomes increasingly important for
MPI to leverage shared memory for intranode communication. This paper investigates the …

Opslaan Citeren Geciteerd door 121 Verwante artikelen Alle 33 versies

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

MPI support for multi-core architectures: Optimized shared memory collectives

RL Graham, G Shipman - Recent Advances in Parallel Virtual Machine …, 2008 - Springer

With local core counts on the rise, taking advantage of shared-memory to optimize collective
operations can improve performance. We study several on-host shared memory optimized …

Opslaan Citeren Geciteerd door 144 Verwante artikelen Alle 12 versies

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Improving the performance of collective operations in MPICH

A survey of power and energy predictive models in HPC systems and applications

Performance analysis of MPI collective operations

SparCML: High-performance sparse communication for machine learning

Optimization of collective reduction operations

Parallel matrix factorization for recommender systems

Optimization of MPI collective communication on BlueGene/L systems

Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance

Dear: Accelerating distributed deep learning with fine-grained all-reduce pipelining

NUMA-aware shared-memory collective communication for MPI

MPI support for multi-core architectures: Optimized shared memory collectives