Rethinking machine learning collective communication as a multi-commodity flow problem

X Liu, B Arzani, SKR Kakarla, L Zhao, V Liu… - Proceedings of the …, 2024 - dl.acm.org
Cloud operators utilize collective communication optimizers to enhance the efficiency of the
single-tenant, centrally managed training clusters they manage. However, current optimizers …

Teal: Learning-accelerated optimization of wan traffic engineering

Z Xu, FY Yan, R Singh, JT Chiu, AM Rush… - Proceedings of the ACM …, 2023 - dl.acm.org
The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for
commercial optimization engines to efficiently solve network traffic engineering (TE) …

FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering

X Liu, S Zhao, Y Cui, X Wang - Proceedings of the ACM SIGCOMM 2024 …, 2024 - dl.acm.org
Traffic Engineering (TE) is critical for improving network performance and reliability. A key
challenge in TE is the management of sudden traffic bursts. Existing TE schemes either do …

Comparing Task Graph Scheduling Algorithms: An Adversarial Approach

J Coleman, B Krishnamachari - arxiv preprint arxiv:2403.07120, 2024 - arxiv.org
Scheduling a task graph representing an application over a heterogeneous network of
computers is a fundamental problem in distributed computing. It is known to be not only NP …

Preprocess your Paths-Speeding up Linear Programming-based Optimization for Segment Routing Traffic Engineering

A Brundiers, T Schüller… - 2024 IFIP Networking …, 2024 - ieeexplore.ieee.org
Many state-of-the-art Segment Routing (SR) Traffic Engineering (TE) algorithms rely on
Linear Program (LP)-based optimization. However, the poor scalability of the latter and the …

End-to-End Performance Analysis of Learning-enabled Systems

P Namyar, M Schapira, R Govindan, S Segarra… - Proceedings of the 23rd …, 2024 - dl.acm.org
We propose a performance analysis tool for learning-enabled systems that allows operators
to uncover potential performance issues before deploying DNNs in their systems. The tools …

Zeal: Rethinking Large-Scale Resource Allocation with" Decouple and Decompose"

Z Xu, FY Yan, M Yu - arxiv preprint arxiv:2412.11447, 2024 - arxiv.org
Resource allocation is fundamental for cloud systems to ensure efficient resource sharing
among tenants. However, the scale of such optimization problems has outgrown the …

PhD Forum Abstract: Cooperative Problem-Solving with Systems of Constrained Mobile Agents

JR Coleman - Proceedings of the 22nd International Conference on …, 2023 - dl.acm.org
Cooperative mobile robot systems have great potential to solve real-world problems but are,
in practice, limited by their physical capabilities, the environment in which they operate, and …

Timely and Efficient Resource Management in Networked Systems

Z Xu - 2024 - search.proquest.com
Resource management is ubiquitous in networked systems and ensures the effective
sharing of resources among various demands. Examples include traffic engineering, cluster …

Traffic Engineering Analysis of Middle Point Selection in Segment Routing

Mİ Yüksel, M Sayıt - 2024 32nd Signal Processing and …, 2024 - ieeexplore.ieee.org
As Internet usage increases day by day, efficient use of the infrastructure deployed by
Internet service providers has gained great importance. Traffic engineering applications …