- Academic Search

D De Sensi, S Di Girolamo… - … Conference for High …, 2020 - ieeexplore.ieee.org

The interconnect is one of the most critical components in large scale computing systems,
and its impact on the performance of applications is going to increase with the system size …

Save Cite Cited by 150 Related articles All 35 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Frontier: exploring exascale

S Atchley, C Zimmer, J Lange, D Bernholdt… - Proceedings of the …, 2023 - dl.acm.org

As the US Department of Energy (DOE) computing facilities began deploying petascale
systems in 2008, DOE was already setting its sights on exascale. In that year, DARPA …

Save Cite Cited by 56 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Wfbench: Automated generation of scientific workflow benchmarks

T Coleman, H Casanova, K Maheshwari… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org

The prevalence of scientific workflows with high computational demands calls for their
execution on various distributed computing platforms, including large-scale leadership-class …

Save Cite Cited by 19 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects

D De Sensi, L Pichetti, F Vella… - … Conference for High …, 2024 - ieeexplore.ieee.org

Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale
supercomputers. On these systems, GPUs on the same node are connected through …

Save Cite Cited by 1 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Disruptive changes in field equation modeling: A simple interface for wafer scale engines

M Woo, T Jordan, R Schreiber, I Sharapov… - arxiv preprint arxiv …, 2022 - arxiv.org

We present a high-level and accessible Application Programming Interface (API) for the
solution of field equations on the Cerebras Systems Wafer-Scale Engine (WSE) with over …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] bu.edu

Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning

B Aksar, E Sencan, B Schwaller, O Aaziz… - … on Parallel and …, 2024 - ieeexplore.ieee.org

With the increasing scale and complexity of High-Performance Computing (HPC) systems,
performance variations in applications caused by anomalies have become significant …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] sciencedirect.com

Understanding hot interconnects with an extensive benchmark survey

Y Li, H Qi, G Lu, F **, Y Guo, X Lu - BenchCouncil Transactions on …, 2022 - Elsevier

Understanding the designs and performance characterizations of hot interconnects on
modern data center and high-performance computing (HPC) clusters is a fruitful research …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] bu.edu

Quantifying the impact of network congestion on application performance and network metrics

Y Zhang, T Groves, B Cook, NJ Wright… - … on Cluster Computing …, 2020 - ieeexplore.ieee.org

In modern high-performance computing (HPC) systems, network congestion is an important
factor that contributes to performance degradation. However, how network congestion …

Save Cite Cited by 16 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

Live forensics for HPC systems: A case study on distributed storage systems

S Jha, S Cui, SS Banerjee, T Xu, J Enos… - … Conference for High …, 2020 - ieeexplore.ieee.org

Large-scale high-performance computing systems frequently experience a wide range of
failure modes, such as reliability failures (eg, hang or crash), and resource overload-related …

Save Cite Cited by 18 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] An optimisation of allreduce communication in message-passing systems

A Jocksch, N Ohana, E Lanti, E Koutsaniti… - Parallel Computing, 2021 - Elsevier

Collective communication, namely the pattern allreduce in message-passing systems, is
optimised based on measurements at the installation time of the library. The algorithms used …

Save Cite Cited by 6 Related articles All 5 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks

An in-depth analysis of the slingshot interconnect

Frontier: exploring exascale

Wfbench: Automated generation of scientific workflow benchmarks

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects

Disruptive changes in field equation modeling: A simple interface for wafer scale engines

Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning

Understanding hot interconnects with an extensive benchmark survey

Quantifying the impact of network congestion on application performance and network metrics

Live forensics for HPC systems: A case study on distributed storage systems

[HTML][HTML] An optimisation of allreduce communication in message-passing systems