Exascale computing and big data

DA Reed, J Dongarra - Communications of the ACM, 2015 - dl.acm.org
Exascale computing and big data Page 1 56 COMMUNICATIONS OF THE ACM | JULY
2015 | VOL. 58 | NO. 7 contributed articles ILL US TRA TION B Y PETER BOLLINGER DOI:10.1145/2699414 …

[HTML][HTML] Convergence of Nanotechnology and Machine Learning: The State of the Art, Challenges, and Perspectives

A Tripathy, AY Patne, S Mohapatra… - International Journal of …, 2024 - mdpi.com
Nanotechnology and machine learning (ML) are rapidly emerging fields with numerous real-
world applications in medicine, materials science, computer engineering, and data …

Evaluating the viability of process replication reliability for exascale systems

K Ferreira, J Stearley, JH Laros III, R Oldfield… - Proceedings of 2011 …, 2011 - dl.acm.org
As high-end computing machines continue to grow in size, issues such as fault tolerance
and reliability limit application scalability. Current techniques to ensure progress across …

A survey on software methods to improve the energy efficiency of parallel computing

C **, BR de Supinski, D Abramson… - … Journal of High …, 2017 - journals.sagepub.com
Energy consumption is one of the top challenges for achieving the next generation of
supercomputing. Codesign of hardware and software is critical for improving energy …

Deep learning for in situ data compression of large turbulent flow simulations

A Glaws, R King, M Sprague - Physical Review Fluids, 2020 - APS
As the size of turbulent flow simulations continues to grow, in situ data compression is
becoming increasingly important for visualization, analysis, and restart checkpointing. For …

Exploring automatic, online failure recovery for scientific applications at extreme scales

M Gamell, DS Katz, H Kolla, J Chen… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
Application resilience is a key challenge that must be addressed in order to realize the
exascale vision. Process/node failures, an important class of failures, are typically handled …

Synthetic fingerprint-database generation

R Cappelli, D Maio, D Maltoni - 2002 International Conference …, 2002 - ieeexplore.ieee.org
This work complements our previous efforts in generating realistic fingerprint images for test
purposes. The main variability which characterizes the acquisition of a fingerprint through an …

MCREngine: A scalable checkpointing system using data-aware aggregation and compression

TZ Islam, K Mohror, S Bagchi, A Moody… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
High performance computing (HPC) systems use checkpoint-restart to tolerate failures.
Typically, applications store their states in checkpoints on a parallel file system (PFS). As …

DASH: A C++ PGAS library for distributed data structures and parallel algorithms

K Fürlinger, T Fuchs… - 2016 IEEE 18th …, 2016 - ieeexplore.ieee.org
We present DASH, a C++ template library that offers distributed data structures and parallel
algorithms and implements a compiler-free PGAS (partitioned global address space) …

Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery

Q Liu, C Jung, D Lee, D Tiwari - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org
This paper presents Bolt, a compiler-directed soft error recovery scheme, that provides fine-
grained and guaranteed recovery without excessive performance and hardware overhead …