SiP-ML: high-bandwidth optical network interconnects for machine learning training

M Khani, M Ghobadi, M Alizadeh, Z Zhu… - Proceedings of the …, 2021 - dl.acm.org
This paper proposes optical network interconnects as a key enabler for building high-
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …

Software-defined “hardware” infrastructures: A survey on enabling technologies and open research directions

A Roozbeh, J Soares, GQ Maguire… - … Surveys & Tutorials, 2018 - ieeexplore.ieee.org
This paper provides an overview of software-defined “hardware” infrastructures (SDHI).
SDHI builds upon the concept of hardware (HW) resource disaggregation. HW resource …

Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

L Liu, P Zhou, G Sun, X Chen, T Wu, H Yu, M Guizani - Neurocomputing, 2024 - Elsevier
With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …

Orion: A distributed file system for {Non-Volatile} main memory and {RDMA-Capable} networks

J Yang, J Izraelevitz, S Swanson - 17th USENIX Conference on File and …, 2019 - usenix.org
High-performance, byte-addressable non-volatile main memories (NVMMs) force system
designers to rethink trade-offs throughout the system stack, often leading to dramatic …

Aquila: A unified, low-latency fabric for datacenter networks

D Gibson, H Hariharan, E Lance, M McLaren… - … USENIX Symposium on …, 2022 - usenix.org
Datacenter workloads have evolved from the data intensive, loosely-coupled workloads of
the past decade to more tightly coupled ones, wherein ultra-low latency communication is …

Thymesisflow: A software-defined, hw/sw co-designed interconnect stack for rack-scale memory disaggregation

C Pinto, D Syrivelis, M Gazzetti… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
With cloud providers constantly seeking the best infrastructure trade-off between
performance delivered to customers and overall energy/utilization efficiency of their data …

Shoal: A network architecture for disaggregated racks

V Shrivastav, A Valadarsky, H Ballani, P Costa… - … USENIX Symposium on …, 2019 - usenix.org
Disaggregated racks comprise a dense cluster of separate pools of compute, memory and
storage blades, all inter-connected through an internal network within a single rack …

Shale: A practical, scalable oblivious reconfigurable network

D Amir, N Saran, T Wilson, R Kleinberg… - Proceedings of the …, 2024 - dl.acm.org
Circuit-switched technologies have long been proposed for handling high-throughput traffic
in datacenter networks, but recent developments in nanosecond-scale reconfiguration have …

Uniform-Cost Multi-Path Routing for Reconfigurable Data Center Networks

J Li, H Gong, F De Marchi, A Gong, Y Lei… - Proceedings of the …, 2024 - dl.acm.org
Reconfigurable data center networks (RDCNs) are arising as a promising data center
network (DCN) design in the post-Moore's law era. However, the constantly reconfigured …

A tale of two topologies: Exploring convertible data center network architectures with flat-tree

Y **a, XS Sun, S Dzinamarira, D Wu… - Proceedings of the …, 2017 - dl.acm.org
This paper promotes convertible data center network architectures, which can dynamically
change the network topology to combine the benefits of multiple architectures. We propose …