Μελετητής Google

H Casanova, A Giersch, A Legrand, M Quinson… - Journal of Parallel and …, 2014 - Elsevier

The study of parallel and distributed applications and platforms, whether in the cluster, grid,
peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 567 Σχετικά άρθρα Όλες οι 18 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A survey of communication performance models for high-performance computing

JA Rico-Gallego, JC Díaz-Martín… - ACM Computing …, 2019 - dl.acm.org

This survey aims to present the state of the art in analytic communication performance
models, providing sufficiently detailed descriptions of particularly noteworthy efforts …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 48 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] pitt.edu

Characterizing the influence of system noise on large-scale applications by simulation

T Hoefler, T Schneider… - SC'10: Proceedings of the …, 2010 - ieeexplore.ieee.org

This paper presents an in-depth analysis of the impact of system noise on large-scale
parallel application performance in realistic settings. Our analytical model shows that not …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 325 Σχετικά άρθρα Όλες οι 40 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] concordia.ca

JDeodorant: Identification and removal of type-checking bad smells

N Tsantalis, T Chaikalis… - 2008 12th European …, 2008 - ieeexplore.ieee.org

In this demonstration, we present an Eclipse plug-in that automatically identifies type-
checking bad smells in Java source code, and resolves them by applying the" replace …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 302 Σχετικά άρθρα Όλες οι 10 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] habrastorage.org

Hiding global synchronization latency in the preconditioned conjugate gradient algorithm

P Ghysels, W Vanroose - Parallel Computing, 2014 - Elsevier

Scalability of Krylov subspace methods suffers from costly global synchronization steps that
arise in dot-products and norm calculations on parallel machines. In this work, a modified …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 220 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] ethz.ch

Using automated performance modeling to find scalability bugs in complex codes

A Calotoiu, T Hoefler, M Poke, F Wolf - Proceedings of the International …, 2013 - dl.acm.org

Many parallel applications suffer from latent performance limitations that may prevent them
from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 196 Σχετικά άρθρα Όλες οι 27 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 29 Σχετικά άρθρα Όλες οι 5 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

sPIN: High-performance streaming Processing in the Network

T Hoefler, S Di Girolamo, K Taranov, RE Grant… - Proceedings of the …, 2017 - dl.acm.org

Optimizing communication performance is imperative for large-scale computing because
communication overheads limit the strong scalability of parallel applications. Today's …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 104 Σχετικά άρθρα Όλες οι 27 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] kuleuven.be

Hiding global communication latency in the GMRES algorithm on massively parallel machines

P Ghysels, TJ Ashby, K Meerbergen… - SIAM journal on scientific …, 2013 - SIAM

In the generalized minimal residual method (GMRES), the global all-to-all communication
required in each iteration for orthogonalization and normalization of the Krylov base vectors …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 172 Σχετικά άρθρα Όλες οι 11 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] gatech.edu

Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms

S Rashidi, S Sridharan, S Srinivasan… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org

Modern Deep Learning systems heavily rely on distributed training over high-performance
accelerator (eg, TPU, GPU)-based hardware platforms. Examples today include Google's …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 59 Σχετικά άρθρα Όλες οι 6 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

LogGOPSim: simulating large-scale applications in the LogGOPS model

Versatile, scalable, and accurate simulation of distributed applications and platforms

A survey of communication performance models for high-performance computing

Characterizing the influence of system noise on large-scale applications by simulation

JDeodorant: Identification and removal of type-checking bad smells

Hiding global synchronization latency in the preconditioned conjugate gradient algorithm

Using automated performance modeling to find scalability bugs in complex codes

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

sPIN: High-performance streaming Processing in the Network

Hiding global communication latency in the GMRES algorithm on massively parallel machines

Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms