An in-depth analysis of the slingshot interconnect

D De Sensi, S Di Girolamo… - … Conference for High …, 2020 - ieeexplore.ieee.org
The interconnect is one of the most critical components in large scale computing systems,
and its impact on the performance of applications is going to increase with the system size …

Flare: Flexible in-network allreduce

D De Sensi, S Di Girolamo, S Ashkboos, S Li… - Proceedings of the …, 2021 - dl.acm.org
The allreduce operation is one of the most commonly used communication routines in
distributed applications. To improve its bandwidth and to reduce network traffic, this …

Noise in the clouds: Influence of network performance variability on application scalability

D De Sensi, T De Matteis, K Taranov… - Proceedings of the …, 2022 - dl.acm.org
Cloud computing represents an appealing opportunity for cost-effective deployment of HPC
workloads on the best-fitting hardware. However, although cloud and on-premise HPC …

Study of workload interference with intelligent routing on dragonfly

Y Kang, X Wang, Z Lan - SC22: International Conference for …, 2022 - ieeexplore.ieee.org
Dragonfly interconnect is a crucial network technol-ogy for supercomputers. To support
exascale systems, network resources are shared such that links and routers are not …

The case of performance variability on dragonfly-based systems

A Bhatele, JJ Thiagarajan, T Groves… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
Performance of a parallel code running on a large supercomputer can vary significantly from
one run to another even when the executable and its input parameters are left unchanged …

Mitigating network noise on dragonfly networks through application-aware routing

D De Sensi, S Di Girolamo, T Hoefler - Proceedings of the International …, 2019 - dl.acm.org
System noise can negatively impact the performance of HPC systems, and the
interconnection network is one of the main factors contributing to this problem. To mitigate …

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects

D De Sensi, L Pichetti, F Vella… - … Conference for High …, 2024 - ieeexplore.ieee.org
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale
supercomputers. On these systems, GPUs on the same node are connected through …

HyperX topology: First at-scale implementation and comparison to the fat-tree

J Domke, S Matsuoka, IR Ivanov, Y Tsushima… - Proceedings of the …, 2019 - dl.acm.org
The de-facto standard topology for modern HPC systems and data-centers are Folded Clos
networks, commonly known as Fat-Trees. The number of network endpoints in these …

Workload interference prevention with intelligent routing and flexible job placement on dragonfly

Y Kang, X Wang, Z Lan - Proceedings of the 2023 ACM SIGSIM …, 2023 - dl.acm.org
Dragonfly is an indispensable interconnect topology for exascale HPC systems. To link tens
of thousands of compute nodes at a reasonable cost, Dragonfly shares network resources …

A deep reinforcement learning-based optimization approach for containerized microservice scheduling in Hybrid Fog/Cloud environments

A Kallel, M Rekik, M Khemakhem - Engineering Applications of Artificial …, 2025 - Elsevier
The deployment of microservices in Hybrid Fog/Cloud (HFC) environments for Internet of
Things (IoT) applications presents a significant challenge in efficiently scheduling …