Predictive reliability and fault management in exascale systems: State of the art and perspectives

R Canal, C Hernandez, R Tornero, A Cilardo… - ACM Computing …, 2020 - dl.acm.org
Performance and power constraints come together with Complementary Metal Oxide
Semiconductor technology scaling in future Exascale systems. Technology scaling makes …

Dare: High-performance state machine replication on rdma networks

M Poke, T Hoefler - Proceedings of the 24th International Symposium on …, 2015 - dl.acm.org
The increasing amount of data that needs to be collected and analyzed requires large-scale
datacenter architectures that are naturally more susceptible to faults of single components …

Cost optimization of secure routing with untrusted devices in software defined networking

A Yazdinejad, RM Parizi, A Dehghantanha… - Journal of Parallel and …, 2020 - Elsevier
Over the years, switches and network routers have been compromised frequently, and a lot
of vulnerabilities have occurred in network infrastructure. Secure routing (SR) is one of the …

Scalable deadlock-free deterministic minimal-path routing engine for infiniband-based dragonfly networks

G Maglione-Mathey, P Yebenes… - … on Parallel and …, 2017 - ieeexplore.ieee.org
Dragonfly topologies are gathering great interest nowadays as one of the most promising
interconnect options for High-Performance Computing (HPC) systems. However …

HyperX topology: First at-scale implementation and comparison to the fat-tree

J Domke, S Matsuoka, IR Ivanov, Y Tsushima… - Proceedings of the …, 2019 - dl.acm.org
The de-facto standard topology for modern HPC systems and data-centers are Folded Clos
networks, commonly known as Fat-Trees. The number of network endpoints in these …

Scheduling-aware routing for supercomputers

J Domke, T Hoefler - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
The interconnection network has a large influence on total cost, application performance,
energy consumption, and overall system efficiency of a supercomputer. Unfortunately …

A secure MANET routing protocol with resilience against byzantine behaviours of malicious or selfish nodes

C Crepeau, CR Davis… - … Conference on Advanced …, 2007 - ieeexplore.ieee.org
Secure routing in mobile ad hoc networks (MANETs) has emerged as a important MANET
research area. MANETs, by virtue of the fact that they are wireless networks, are more …

Routing on the dependency graph: A new approach to deadlock-free high-performance routing

J Domke, T Hoefler, S Matsuoka - proceedings of the 25th ACM …, 2016 - dl.acm.org
Lossless interconnection networks are omnipresent in high performance computing
systems, data centers and network-on-chip architectures. Such networks require efficient …

From flops to bytes: disruptive change in high-performance computing towards the post-moore era

S Matsuoka, H Amano, K Nakajima, K Inoue… - Proceedings of the …, 2016 - dl.acm.org
Slowdown and inevitable end in exponential scaling of processor performance, the end of
the so-called" Moore's Law" is predicted to occur around 2025--2030 timeframe. Because …

Preliminary performance analysis of multi-rail fat-tree networks

N Wolfe, M Mubarak, N Jain, J Domke… - 2017 17th IEEE/ACM …, 2017 - ieeexplore.ieee.org
Among the low-diameter, high-radix networks beingdeployed in next-generation HPC
systems, dual-rail fat-treenetworks are a promising approach. Adding additional …