Google Tudós

F Cappello, G Al, W Gropp, S Kale, B Kramer… - … and Innovations: an …, 2014 - dl.acm.org

Resilience is a major roadblock for HPC executions on future exascale systems. These
systems will typically gather millions of CPU cores running up to a billion threads …

Mentés Hivatkozás Idézetek száma: 436 Kapcsolódó cikkek Mind a(z) 15 változat

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org

The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Mentés Hivatkozás Idézetek száma: 68 Kapcsolódó cikkek Mind a(z) 7 változat

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

The future of scientific workflows

E Deelman, T Peterka, I Altintas… - … Journal of High …, 2018 - journals.sagepub.com

Today's computational, experimental, and observational sciences rely on computations that
involve many related tasks. The success of a scientific mission often hinges on the computer …

Mentés Hivatkozás Idézetek száma: 240 Kapcsolódó cikkek Mind a(z) 12 változat

FTI: High performance fault tolerance interface for hybrid systems

L Bautista-Gomez, S Tsuboi, D Komatitsch… - Proceedings of 2011 …, 2011 - dl.acm.org

Large scientific applications deployed on current petascale systems expend a significant
amount of their execution time dum** checkpoint files to remote storage. New fault tolerant …

Mentés Hivatkozás Idézetek száma: 452 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems

IP Egwutuoha, D Levy, B Selic, S Chen - The Journal of Supercomputing, 2013 - Springer

Abstract In recent years, High Performance Computing (HPC) systems have been shifting
from expensive massively parallel architectures to clusters of commodity PCs to take …

Mentés Hivatkozás Idézetek száma: 340 Kapcsolódó cikkek Mind a(z) 11 változat

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Post-failure recovery of MPI communication capability: Design and rationale

W Bland, A Bouteiller, T Herault… - … Journal of High …, 2013 - journals.sagepub.com

As supercomputers are entering an era of massive parallelism where the frequency of faults
is increasing, the MPI Standard remains distressingly vague on the consequence of failures …

Mentés Hivatkozás Idézetek száma: 269 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Evaluating the viability of process replication reliability for exascale systems

K Ferreira, J Stearley, JH Laros III, R Oldfield… - Proceedings of 2011 …, 2011 - dl.acm.org

As high-end computing machines continue to grow in size, issues such as fault tolerance
and reliability limit application scalability. Current techniques to ensure progress across …

Mentés Hivatkozás Idézetek száma: 332 Kapcsolódó cikkek Mind a(z) 12 változat

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

A 5D gyrokinetic full-f global semi-Lagrangian code for flux-driven ion turbulence simulations

V Grandgirard, J Abiteboul, J Bigot… - Computer physics …, 2016 - Elsevier

This paper addresses non-linear gyrokinetic simulations of ion temperature gradient (ITG)
turbulence in tokamak plasmas. The electrostatic GYSELA code is one of the few …

Mentés Hivatkozás Idézetek száma: 187 Kapcsolódó cikkek Mind a(z) 12 változat

[Free GPT-4]
[DeepSeek]

[PDF] utk.edu

Algorithm-based fault tolerance for dense matrix factorizations

P Du, A Bouteiller, G Bosilca, T Herault… - Acm sigplan notices, 2012 - dl.acm.org

Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific
applications that require solving systems of linear equations, eigenvalues and linear least …

Mentés Hivatkozás Idézetek száma: 190 Kapcsolódó cikkek Mind a(z) 10 változat

[Free GPT-4]
[DeepSeek]

[PDF] uiuc.edu

A scalable double in-memory checkpoint and restart scheme towards exascale

G Zheng, X Ni, LV Kalé - IEEE/IFIP International Conference on …, 2012 - ieeexplore.ieee.org

As the size of supercomputers increases, the probability of system failure grows
substantially, posing an increasingly significant challenge for scalability. It is important to …

Mentés Hivatkozás Idézetek száma: 164 Kapcsolódó cikkek Mind a(z) 9 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Fault tolerance in petascale/exascale systems: Current knowledge, challenges and research...

[HTML][HTML] Toward exascale resilience: 2014 update

The landscape of exascale research: A data-driven literature analysis

The future of scientific workflows

FTI: High performance fault tolerance interface for hybrid systems

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems

Post-failure recovery of MPI communication capability: Design and rationale

Evaluating the viability of process replication reliability for exascale systems

A 5D gyrokinetic full-f global semi-Lagrangian code for flux-driven ion turbulence simulations

Algorithm-based fault tolerance for dense matrix factorizations

A scalable double in-memory checkpoint and restart scheme towards exascale