- Academic Search

R Canal, C Hernandez, R Tornero, A Cilardo… - ACM Computing …, 2020 - dl.acm.org

Performance and power constraints come together with Complementary Metal Oxide
Semiconductor technology scaling in future Exascale systems. Technology scaling makes …

Save Cite Cited by 36 Related articles All 13 versions Free GPT-4 DeepSeek

[BOOK][B] Parallel programming for modern high performance computing systems

P Czarnul - 2018 - books.google.com

In view of the growing presence and popularity of multicore and manycore processors,
accelerators, and coprocessors, as well as clusters using such computing devices, the …

Save Cite Cited by 54 Related articles All 6 versions Free GPT-4 DeepSeek Library Search

[Free GPT-4]
[DeepSeek]

[PDF] oapen.org

[PDF][PDF] Static analysis-based approaches for secure software development

M Siavvas, E Gelenbe, D Kehagias… - Security in Computer …, 2018 - library.oapen.org

Software security is a matter of major concern for software development enterprises that
wish to deliver highly secure software products to their customers. Static analysis is …

Save Cite Cited by 53 Related articles All 13 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CRUM: Checkpoint-restart support for CUDA's unified memory

R Garg, A Mohan, M Sullivan… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

Unified Virtual Memory (UVM) was recently introduced with CUDA version 8 and the Pascal
GPU. The older CUDA programming style is akin to older large-memory UNIX applications …

Save Cite Cited by 44 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] upc.edu

Checkpoint restart support for heterogeneous hpc applications

K Parasyris, K Keller… - 2020 20th IEEE/ACM …, 2020 - ieeexplore.ieee.org

As we approach the era of exa-scale computing, fault tolerance is of growing importance.
The increasing number of cores as well as the increased complexity of modern …

Save Cite Cited by 27 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CRAC: checkpoint-restart architecture for CUDA with streams and UVM

T Jain, G Cooperman - SC20: International Conference for High …, 2020 - ieeexplore.ieee.org

The share of the top 500 supercomputers with NVIDIA GPUs is now over 25% and continues
to grow. While fault tolerance is a critical issue for supercomputing, there does not currently …

Save Cite Cited by 26 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

MANA for MPI: MPI-agnostic network-agnostic transparent checkpointing

R Garg, G Price, G Cooperman - … of the 28th international symposium on …, 2019 - dl.acm.org

Transparently checkpointing MPI for fault tolerance and load balancing is a long-standing
problem in HPC. The problem has been complicated by the need to provide checkpoint …

Save Cite Cited by 31 Related articles All 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] sjtu.edu.cn

Asymmetric resilience: Exploiting task-level idempotency for transient error recovery in accelerator-based systems

J Leng, A Buyuktosunoglu, R Bertran… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Accelerators make the task of building systems that are re-silient against transient errors like
voltage noise and soft errors hard. Architects integrate accelerators into the system as black …

Save Cite Cited by 26 Related articles All 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Distributed configuration, authorization and management in the cloud-based internet of things

M Henze, B Wolters, R Matzutt… - 2017 IEEE Trustcom …, 2017 - ieeexplore.ieee.org

Network-based deployments within the Internet of Things increasingly rely on the cloud-
controlled federation of individual networks to configure, authorize, and manage devices …

Save Cite Cited by 40 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Capturing snapshots of offload applications on many-core coprocessors

CH Li, G Coviello, S Chakradhar, A Rezaei - US Patent 10,678,550, 2020 - Google Patents

Methods are provided. A method includes capturing a snap shot of an offload process being
executed by one or more many-core processors. The offload process is in signal …

Save Cite Cited by 41 Related articles All 4 versions Free GPT-4 DeepSeek Cached

Create alert

Cite

Advanced search

Saved to My library

CheCL: Transparent checkpointing and process migration of OpenCL applications

Predictive reliability and fault management in exascale systems: State of the art and perspectives

[BOOK][B] Parallel programming for modern high performance computing systems

[PDF][PDF] Static analysis-based approaches for secure software development

CRUM: Checkpoint-restart support for CUDA's unified memory

Checkpoint restart support for heterogeneous hpc applications

CRAC: checkpoint-restart architecture for CUDA with streams and UVM

MANA for MPI: MPI-agnostic network-agnostic transparent checkpointing

Asymmetric resilience: Exploiting task-level idempotency for transient error recovery in accelerator-based systems

Distributed configuration, authorization and management in the cloud-based internet of things

Capturing snapshots of offload applications on many-core coprocessors