Exascale computing and big data
Exascale computing and big data Page 1 56 COMMUNICATIONS OF THE ACM | JULY
2015 | VOL. 58 | NO. 7 contributed articles ILL US TRA TION B Y PETER BOLLINGER DOI:10.1145/2699414 …
2015 | VOL. 58 | NO. 7 contributed articles ILL US TRA TION B Y PETER BOLLINGER DOI:10.1145/2699414 …
[HTML][HTML] Convergence of Nanotechnology and Machine Learning: The State of the Art, Challenges, and Perspectives
Nanotechnology and machine learning (ML) are rapidly emerging fields with numerous real-
world applications in medicine, materials science, computer engineering, and data …
world applications in medicine, materials science, computer engineering, and data …
Evaluating the viability of process replication reliability for exascale systems
As high-end computing machines continue to grow in size, issues such as fault tolerance
and reliability limit application scalability. Current techniques to ensure progress across …
and reliability limit application scalability. Current techniques to ensure progress across …
A survey on software methods to improve the energy efficiency of parallel computing
Energy consumption is one of the top challenges for achieving the next generation of
supercomputing. Codesign of hardware and software is critical for improving energy …
supercomputing. Codesign of hardware and software is critical for improving energy …
Deep learning for in situ data compression of large turbulent flow simulations
As the size of turbulent flow simulations continues to grow, in situ data compression is
becoming increasingly important for visualization, analysis, and restart checkpointing. For …
becoming increasingly important for visualization, analysis, and restart checkpointing. For …
Exploring automatic, online failure recovery for scientific applications at extreme scales
Application resilience is a key challenge that must be addressed in order to realize the
exascale vision. Process/node failures, an important class of failures, are typically handled …
exascale vision. Process/node failures, an important class of failures, are typically handled …
Synthetic fingerprint-database generation
This work complements our previous efforts in generating realistic fingerprint images for test
purposes. The main variability which characterizes the acquisition of a fingerprint through an …
purposes. The main variability which characterizes the acquisition of a fingerprint through an …
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
High performance computing (HPC) systems use checkpoint-restart to tolerate failures.
Typically, applications store their states in checkpoints on a parallel file system (PFS). As …
Typically, applications store their states in checkpoints on a parallel file system (PFS). As …
DASH: A C++ PGAS library for distributed data structures and parallel algorithms
We present DASH, a C++ template library that offers distributed data structures and parallel
algorithms and implements a compiler-free PGAS (partitioned global address space) …
algorithms and implements a compiler-free PGAS (partitioned global address space) …
Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery
This paper presents Bolt, a compiler-directed soft error recovery scheme, that provides fine-
grained and guaranteed recovery without excessive performance and hardware overhead …
grained and guaranteed recovery without excessive performance and hardware overhead …