Comprehensive review on congestion detection, alleviation, and control for IoT networks

P Anitha, HS Vimala, J Shreyas - Journal of Network and Computer …, 2024 - Elsevier
Abstract Context: The Internet of Things (IoT) comprises various computing devices that
operate on a non-standard platform and can connect to wireless networks to transmit data …

Rdma over ethernet for distributed training at meta scale

A Gangidi, R Miao, S Zheng, SJ Bondu… - Proceedings of the …, 2024 - dl.acm.org
The rapid growth in both computational density and scale in AI models in recent years
motivates the construction of an efficient and reliable dedicated network infrastructure. This …

{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters

S Rajasekaran, M Ghobadi, A Akella - 21st USENIX Symposium on …, 2024 - usenix.org
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …

Evaluating Transport Layer Congestion Control Algorithms: A Comprehensive Survey

CL Vielhaus, C von Lengerke, V Latzko… - … Surveys & Tutorials, 2025 - ieeexplore.ieee.org
Congestion Control (CC) limits the transmission rate of data packets in the transport layer
protocol, eg, in TCP and QUIC, of a sending endpoint so as to mitigate the congestion in …

Slowdown as a metric for congestion control fairness

A Zapletal, F Kuipers - Proceedings of the 22nd ACM Workshop on Hot …, 2023 - dl.acm.org
The conventional definition of fairness in congestion control is flow rate fairness. However,
Internet users typically care about flow completion times (FCTs) and flow rate fairness does …

Crux: Gpu-efficient communication scheduling for deep learning training

J Cao, Y Guan, K Qian, J Gao, W **ao, J Dong… - Proceedings of the …, 2024 - dl.acm.org
Deep learning training (DLT), eg, large language model (LLM) training, has become one of
the most important services in multitenant cloud computing. By deeply studying in …

Revisiting congestion control for lossless ethernet

Y Zhang, Q Meng, C Hu, F Ren - 21st USENIX Symposium on …, 2024 - usenix.org
Congestion control is a key enabler for lossless Ethernet at scale. In this paper, we revisit
this classic topic from a new perspective, ie, understanding and exploiting the intrinsic …

Green with envy: Unfair congestion control algorithms can be more energy efficient

S Arslan, S Renganathan, B Spang - … of the 22nd ACM Workshop on Hot …, 2023 - dl.acm.org
Despite 40 years of active research on congestion control, there has been little or no
consideration of how it impacts the energy usage of end-hosts or networking equipment …

MLTCP: A Distributed Technique to Approximate Centralized Flow Scheduling For Machine Learning

S Rajasekaran, S Narang, AA Zabreyko… - Proceedings of the 23rd …, 2024 - dl.acm.org
This paper argues that congestion control protocols in machine learning datacenters sit at a
sweet spot between centralized and distributed flow scheduling solutions. We present …

Blockllm: Multi-tenant finer-grained serving for large language models

B Hu, J Li, L Xu, M Lee, A Jajoo, GW Kim, H Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
The increasing demand for Large Language Models (LLMs) across various applications has
led to a significant shift in the design of deep learning serving systems. Deploying LLMs …