Versatile, scalable, and accurate simulation of distributed applications and platforms

H Casanova, A Giersch, A Legrand, M Quinson… - Journal of Parallel and …, 2014 - Elsevier
The study of parallel and distributed applications and platforms, whether in the cluster, grid,
peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of …

A survey of communication performance models for high-performance computing

JA Rico-Gallego, JC Díaz-Martín… - ACM Computing …, 2019 - dl.acm.org
This survey aims to present the state of the art in analytic communication performance
models, providing sufficiently detailed descriptions of particularly noteworthy efforts …

Characterizing the influence of system noise on large-scale applications by simulation

T Hoefler, T Schneider… - SC'10: Proceedings of the …, 2010 - ieeexplore.ieee.org
This paper presents an in-depth analysis of the impact of system noise on large-scale
parallel application performance in realistic settings. Our analytical model shows that not …

JDeodorant: Identification and removal of type-checking bad smells

N Tsantalis, T Chaikalis… - 2008 12th European …, 2008 - ieeexplore.ieee.org
In this demonstration, we present an Eclipse plug-in that automatically identifies type-
checking bad smells in Java source code, and resolves them by applying the" replace …

Hiding global synchronization latency in the preconditioned conjugate gradient algorithm

P Ghysels, W Vanroose - Parallel Computing, 2014 - Elsevier
Scalability of Krylov subspace methods suffers from costly global synchronization steps that
arise in dot-products and norm calculations on parallel machines. In this work, a modified …

Using automated performance modeling to find scalability bugs in complex codes

A Calotoiu, T Hoefler, M Poke, F Wolf - Proceedings of the International …, 2013 - dl.acm.org
Many parallel applications suffer from latent performance limitations that may prevent them
from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only …

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

sPIN: High-performance streaming Processing in the Network

T Hoefler, S Di Girolamo, K Taranov, RE Grant… - Proceedings of the …, 2017 - dl.acm.org
Optimizing communication performance is imperative for large-scale computing because
communication overheads limit the strong scalability of parallel applications. Today's …

Hiding global communication latency in the GMRES algorithm on massively parallel machines

P Ghysels, TJ Ashby, K Meerbergen… - SIAM journal on scientific …, 2013 - SIAM
In the generalized minimal residual method (GMRES), the global all-to-all communication
required in each iteration for orthogonalization and normalization of the Krylov base vectors …

Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms

S Rashidi, S Sridharan, S Srinivasan… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org
Modern Deep Learning systems heavily rely on distributed training over high-performance
accelerator (eg, TPU, GPU)-based hardware platforms. Examples today include Google's …