Load balancing in data center networks: A survey

J Zhang, FR Yu, S Wang, T Huang… - … Surveys & Tutorials, 2018 - ieeexplore.ieee.org
Data center networks usually employ the scale-out model to provide high bisection
bandwidth for applications. A large amount of data is required to be transferred frequently …

Networking for big data: A survey

S Yu, M Liu, W Dou, X Liu… - … Communications Surveys & …, 2016 - ieeexplore.ieee.org
Complementary to the fancy big data applications, networking for big data is an
indispensable supporting platform for these applications in practice. This emerging research …

HPCC: High precision congestion control

Y Li, R Miao, HH Liu, Y Zhuang, F Feng… - Proceedings of the …, 2019 - dl.acm.org
Congestion control (CC) is the key to achieving ultra-low latency, high bandwidth and
network stability in high-speed networks. From years of experience operating large-scale …

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

Homa: A receiver-driven low-latency transport protocol using network priorities

B Montazeri, Y Li, M Alizadeh… - Proceedings of the 2018 …, 2018 - dl.acm.org
Homa is a new transport protocol for datacenter networks. It provides exceptionally low
latency, especially for workloads with a high volume of very short messages, and it also …

Hula: Scalable load balancing using programmable data planes

N Katta, M Hira, C Kim, A Sivaraman… - Proceedings of the …, 2016 - dl.acm.org
Datacenter networks employ multi-rooted topologies (eg, Leaf-Spine, Fat-Tree) to provide
large bisection bandwidth. These topologies use a large degree of multipathing, and need a …

Netllm: Adapting large language models for networking

D Wu, X Wang, Y Qiao, Z Wang, J Jiang, S Cui… - Proceedings of the …, 2024 - dl.acm.org
Many networking tasks now employ deep learning (DL) to solve complex prediction and
optimization problems. However, current design philosophy of DL-based algorithms entails …

Shinjuku: Preemptive Scheduling for {μsecond-scale} Tail Latency

K Kaffes, T Chong, JT Humphries, A Belay… - … USENIX Symposium on …, 2019 - usenix.org
The recently proposed dataplanes for microsecond scale applications, such as IX and
ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world …

Bolt:{Sub-RTT} congestion control for {Ultra-Low} latency

S Arslan, Y Li, G Kumar, N Dukkipati - 20th USENIX Symposium on …, 2023 - usenix.org
Data center networks are inclined towards increasing line rates to 200Gbps and beyond to
satisfy the performance requirements of applications such as NVMe and distributed ML. With …