Datacenter traffic control: Understanding techniques and tradeoffs
M Noormohammadpour… - … Surveys & Tutorials, 2017 - ieeexplore.ieee.org
Datacenters provide cost-effective and flexible access to scalable compute and storage
resources necessary for today's cloud computing needs. A typical datacenter is made up of …
resources necessary for today's cloud computing needs. A typical datacenter is made up of …
Congestion control in named data networking–a survey
Abstract As a typical Information Centric Networking, Named Data Networking (NDN) has
attracted wide research attentions in recent years. NDN evolves today's host-centric network …
attracted wide research attentions in recent years. NDN evolves today's host-centric network …
A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …
GPUs and CPUs for computation and network bandwidth for distributed training. However …
Swift: Delay is simple and effective for congestion control in the datacenter
We report on experiences with Swift congestion control in Google datacenters. Swift targets
an end-to-end delay by using AIMD control, with pacing under extreme congestion. With …
an end-to-end delay by using AIMD control, with pacing under extreme congestion. With …
HPCC: High precision congestion control
Congestion control (CC) is the key to achieving ultra-low latency, high bandwidth and
network stability in high-speed networks. From years of experience operating large-scale …
network stability in high-speed networks. From years of experience operating large-scale …
Azure accelerated networking:{SmartNICs} in the public cloud
Modern cloud architectures rely on each server running its own networking stack to
implement policies such as tunneling for virtual networks, security, and load balancing …
implement policies such as tunneling for virtual networks, security, and load balancing …
A cloud-scale acceleration architecture
Hyperscale datacenter providers have struggled to balance the growing need for
specialized hardware (efficiency) with the economic benefits of homogeneity …
specialized hardware (efficiency) with the economic benefits of homogeneity …
Homa: A receiver-driven low-latency transport protocol using network priorities
Homa is a new transport protocol for datacenter networks. It provides exceptionally low
latency, especially for workloads with a high volume of very short messages, and it also …
latency, especially for workloads with a high volume of very short messages, and it also …
An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends
Traditionally, the data plane has been designed with fixed functions to forward packets using
a small set of protocols. This closed-design paradigm has limited the capability of the …
a small set of protocols. This closed-design paradigm has limited the capability of the …
{MegaScale}: Scaling large language model training to more than 10,000 {GPUs}
We present the design, implementation and engineering experience in building and
deploying MegaScale, a production system for training large language models (LLMs) at the …
deploying MegaScale, a production system for training large language models (LLMs) at the …