Google Académico

D De Sensi, S Di Girolamo… - … Conference for High …, 2020 - ieeexplore.ieee.org

The interconnect is one of the most critical components in large scale computing systems,
and its impact on the performance of applications is going to increase with the system size …

Guardar Citar Citado por 150 Artículos relacionados Las 35 versiones

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications

A Agelastos, B Allan, J Brandt… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org

Understanding how resources of High Performance Compute platforms are utilized by
applications both individually and as a composite is key to application and platform …

Guardar Citar Citado por 306 Artículos relacionados Las 9 versiones

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Diagnosing performance variations in HPC applications using machine learning

O Tuncer, E Ates, Y Zhang, A Turk, J Brandt… - … Conference, ISC High …, 2017 - Springer

With the growing complexity and scale of high performance computing (HPC) systems,
application performance variation has become a significant challenge in efficient and …

Guardar Citar Citado por 139 Artículos relacionados Las 12 versiones

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

An integrated tutorial on InfiniBand, verbs, and MPI

P MacArthur, Q Liu, RD Russell… - … Surveys & Tutorials, 2017 - ieeexplore.ieee.org

This tutorial presents the details of the interconnection network utilized in many high
performance computing (HPC) systems today.“InfiniBand” is the hardware interconnect …

Guardar Citar Citado por 37 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent

J Daily, A Vishnu, C Siegel, T Warfel… - arxiv preprint arxiv …, 2018 - arxiv.org

In this paper, we present GossipGraD-a gossip communication protocol based Stochastic
Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale …

Guardar Citar Citado por 115 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Online diagnosis of performance variation in HPC systems using machine learning

O Tuncer, E Ates, Y Zhang, A Turk… - … on Parallel and …, 2018 - ieeexplore.ieee.org

As the size and complexity of high performance computing (HPC) systems grow in line with
advancements in hardware and software technology, HPC systems increasingly suffer from …

Guardar Citar Citado por 93 Artículos relacionados Las 9 versiones

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Is big data performance reproducible in modern cloud networks?

A Uta, A Custura, D Duplyakin, I Jimenez… - … USENIX symposium on …, 2020 - usenix.org

Performance variability has been acknowledged as a problem for over a decade by cloud
practitioners and performance engineers. Yet, our survey of top systems conferences …

Guardar Citar Citado por 84 Artículos relacionados Las 25 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Watch out for the bully! job interference study on dragonfly network

X Yang, J Jenkins, M Mubarak… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org

High-radix, low-diameter dragonfly networks will be a common choice in next-generation
supercomputers. Preliminary studies show that random job placement with adaptive routing …

Guardar Citar Citado por 111 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Run-to-run variability on Xeon Phi based Cray XC systems

S Chunduri, K Harms, S Parker, V Morozov… - Proceedings of the …, 2017 - dl.acm.org

The increasing complexity of HPC systems has introduced new sources of variability, which
can contribute to significant differences in run-to-run performance of applications. With …

Guardar Citar Citado por 99 Artículos relacionados Las 5 versiones

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Analyzing network health and congestion in dragonfly-based supercomputers

A Bhatele, N Jain, Y Livnat, V Pascucci… - 2016 IEEE …, 2016 - ieeexplore.ieee.org

The dragonfly topology is a popular choice for building high-radix, low-diameter, hierarchical
networks with high-bandwidth links. On Cray installations of the dragonfly network, job …

Guardar Citar Citado por 107 Artículos relacionados Las 4 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

There goes the neighborhood: performance degradation due to nearby jobs

An in-depth analysis of the slingshot interconnect

The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications

Diagnosing performance variations in HPC applications using machine learning

An integrated tutorial on InfiniBand, verbs, and MPI

Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent

Online diagnosis of performance variation in HPC systems using machine learning

Is big data performance reproducible in modern cloud networks?

Watch out for the bully! job interference study on dragonfly network

Run-to-run variability on Xeon Phi based Cray XC systems

Analyzing network health and congestion in dragonfly-based supercomputers