A comparison of virtualization technologies for HPC

JP Walters, V Chaudhary, M Cha… - 22nd International …, 2008 - ieeexplore.ieee.org
Virtualization is a common strategy for improving the utilization of existing computing
resources, particularly within data centers. However, its use for high performance computing …

A scalable double in-memory checkpoint and restart scheme towards exascale

G Zheng, X Ni, LV Kalé - IEEE/IFIP International Conference on …, 2012 - ieeexplore.ieee.org
As the size of supercomputers increases, the probability of system failure grows
substantially, posing an increasingly significant challenge for scalability. It is important to …

[PDF][PDF] Collaboro: a collaborative (meta) modeling tool

JLC Izquierdo, J Cabot - PeerJ Computer Science, 2016 - peerj.com
Motivation Scientists increasingly rely on intelligent information systems to help them in their
daily tasks, in particular for managing research objects, like publications or datasets. The …

ACR: Automatic checkpoint/restart for soft and hard error protection

X Ni, E Meneses, N Jain, LV Kalé - Proceedings of the international …, 2013 - dl.acm.org
As machines increase in scale, many researchers have predicted that failure rates will
correspondingly increase. Soft errors do not inhibit execution, but may silently generate …

A framework for elastic execution of existing mpi programs

A Raveendran, T Bicer… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
There is a clear trend towards using cloud resources in the scientific or the HPC community,
with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a …

[書籍][B] Parallel science and engineering applications: The Charm++ approach

LV Kale, A Bhatele - 2016 - books.google.com
This book highlights the use of Charm++ in a wide variety of scientific and engineering
fields. It emphasizes the adaptivity, asynchrony, and message-driven execution of Charm++ …

Predicting the performance impact of different fat-tree configurations

N Jain, A Bhatele, LH Howell, D Böhme… - Proceedings of the …, 2017 - dl.acm.org
The fat-tree topology is one of the most commonly used network topologies in HPC systems.
Vendors support several options that can be configured when deploying fat-tree networks on …

A study on communication issues for systems-on-chip

CA Zeferino, ME Kreutz, L Carro… - … . 15th Symposium on …, 2002 - ieeexplore.ieee.org
Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …

Avoiding hot-spots on two-level direct networks

A Bhatele, N Jain, WD Gropp, LV Kale - Proceedings of 2011 …, 2011 - dl.acm.org
A low-diameter, fast interconnection network is going to be a prerequisite for building
exascale machines. A two-level direct network has been proposed by several groups as a …

[PDF][PDF] Charm++ and AMPI: Adaptive runtime strategies via migratable objects

LV Kale, G Zheng - … for Parallel and Distributed Adaptive Applications, 2009 - academia.edu
Parallel programming is certainly more difficult than sequential programming because of the
additional issues one has to deal with in a parallel program. One has to decide what …