Google znalac

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org

The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Spremi Citiraj Spominje se 69 puta Srodni članci Svih 7 inačica

[Free GPT-4]
[DeepSeek]

[PDF] univ-grenoble-alpes.fr

A checkpoint of research on parallel i/o for high-performance computing

FZ Boito, EC Inacio, JL Bez, POA Navaux… - ACM Computing …, 2018 - dl.acm.org

We present a comprehensive survey on parallel I/O in the high-performance computing
(HPC) context. This is an important field for HPC because of the historic gap between …

Spremi Citiraj Spominje se 57 puta Srodni članci Svih 9 inačica

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Performance optimality or reproducibility: that is the question

T Patki, JJ Thiagarajan, A Ayala, TZ Islam - Proceedings of the …, 2019 - dl.acm.org

The era of extremely heterogeneous supercomputing brings with itself the devil of increased
performance variation and reduced reproducibility. There is a lack of understanding in the …

Spremi Citiraj Spominje se 29 puta Srodni članci Svih 4 inačica

A systematic survey on fault-tolerant solutions for distributed data analytics: Taxonomy, comparison, and future directions

S Isukapalli, SN Srirama - Computer Science Review, 2024 - Elsevier

Fault tolerance is becoming increasingly important for upcoming exascale systems,
supporting distributed data processing, due to the expected decrease in the Mean Time …

Spremi Citiraj Srodni članci Svih 2 inačica

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

S Chakraborty, I Laguna, M Emani… - Concurrency and …, 2020 - Wiley Online Library

Scientists from many different fields have been develo** Bulk‐Synchronous MPI
applications to simulate and study a wide variety of scientific phenomena. Since failure rates …

Spremi Citiraj Spominje se 35 puta Srodni članci Svih 3 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring energy saving opportunities in fault tolerant HPC systems

M Morán, J Balladini, D Rexachs, E Rucci - Journal of Parallel and …, 2024 - Elsevier

Nowadays, improving the energy efficiency of high-performance computing (HPC) systems
is one of the main drivers in scientific and technological research. As large-scale HPC …

Spremi Citiraj Spominje se 3 puta Srodni članci Svih 7 inačica

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Prediction of energy consumption by checkpoint/restart in HPC

M Morán, J Balladini, D Rexachs, E Luque - IEEE Access, 2019 - ieeexplore.ieee.org

The fault tolerance method most used today in high-performance computing (HPC) is
coordinated checkpointing. This, like any other fault tolerance method, adds additional …

Spremi Citiraj Spominje se 14 puta Srodni članci Svih 6 inačica

[Free GPT-4]
[DeepSeek]

[PDF] colostate.edu

Optimizing checkpoint intervals for reduced energy use in exascale systems

D Dauwe, R Jhaveri, S Pasricha… - 2017 Eighth …, 2017 - ieeexplore.ieee.org

In today's high performance computing (HPC) systems, the probability of applications
experiencing failures has increased significantly with the increase in the number of system …

Spremi Citiraj Spominje se 19 puta Srodni članci Svih 8 inačica

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Fault-tolerant regularity-based real-time virtual resources

AMK Cheng, G Dai, PK Paluri, M Ansari… - 2019 IEEE 25th …, 2019 - ieeexplore.ieee.org

Many safety-critical applications employ embedded real-time systems where both timing and
fault tolerance requirements must be continually satisfied. The Regularity-based Resource …

Spremi Citiraj Spominje se 12 puta Srodni članci Svih 3 inačica

Exploiting Efficiency Opportunities Based on Workloads with Electron on Heterogeneous Clusters

R DelValle, P Kaushik, A Jain, J Hartog… - Proceedings of the10th …, 2017 - dl.acm.org

Resource Management tools for large-scale clusters and data centers typically schedule
resources based on task requirements specified in terms of processor, memory, and disk …

Spremi Citiraj Spominje se 8 puta Srodni članci

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Power-check: An energy-efficient checkpointing framework for HPC clusters

The landscape of exascale research: A data-driven literature analysis

A checkpoint of research on parallel i/o for high-performance computing

Performance optimality or reproducibility: that is the question

A systematic survey on fault-tolerant solutions for distributed data analytics: Taxonomy, comparison, and future directions

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

Exploring energy saving opportunities in fault tolerant HPC systems

Prediction of energy consumption by checkpoint/restart in HPC

Optimizing checkpoint intervals for reduced energy use in exascale systems

Fault-tolerant regularity-based real-time virtual resources

Exploiting Efficiency Opportunities Based on Workloads with Electron on Heterogeneous Clusters