A Holistic View of AI-driven Network Incident Management
We discuss the potential improvement large language models (LLM) can provide in incident
management and how they can overhaul the ways operators conduct incident management …
management and how they can overhaul the ways operators conduct incident management …
Running {BGP} in Data Centers at Scale
Border Gateway Protocol (BGP) forms the foundation for routing in the Internet. More
recently, BGP has made serious inroads into data centers on account of its scalability …
recently, BGP has made serious inroads into data centers on account of its scalability …
A Social Network Under Social Distancing:{Risk-Driven} Backbone Management During {COVID-19} and Beyond
As the COVID-19 pandemic reshapes our social landscape, its lessons have far-reaching
implications on how online service providers manage their infrastructure to mitigate risks …
implications on how online service providers manage their infrastructure to mitigate risks …
A composition framework for change management
Change management has been a long-standing challenge for network operations. The large
scale and diversity of networks, their complex dependencies, and continuous evolution …
scale and diversity of networks, their complex dependencies, and continuous evolution …
{CAPA}: An Architecture For Operating Cluster Networks With High Availability
Management operations are a major source of outages for networks. A number of best
practices designed to reduce and mitigate such outages are well known, but their …
practices designed to reduce and mitigate such outages are well known, but their …
Boosting bandwidth availability over inter-DC WAN
Inter-DataCenter Wide Area Network (Inter-DC WAN) that connects geographically
distributed data centers is becoming one of the most critical network infrastructures. Due to …
distributed data centers is becoming one of the most critical network infrastructures. Due to …
Klotski: Efficient and Safe Network Migration of Large Production Datacenters
This paper presents the design, implementation, evaluation, and deployment of Meta's
production network migration system. We first introduce the network migration problem for …
production network migration system. We first introduce the network migration problem for …
[HTML][HTML] RADiCe: A Risk Analysis Framework for Data Centers
Datacenter service providers face engineering and operational challenges involving
numerous risk aspects. Bad decisions can result in financial penalties, competitive …
numerous risk aspects. Bad decisions can result in financial penalties, competitive …
Occam: A Programming System for Reliable Network Management
The complexity of large networks makes their management a daunting task. State-of-the-art
network management tools use workflow systems for automation, but they do not adequately …
network management tools use workflow systems for automation, but they do not adequately …
Achieving high availability in inter-DC WAN traffic engineering
Inter-DataCenter Wide Area Network (Inter-DC WAN) that connects geographically
distributed data centers is becoming one of the most critical network infrastructures. Due to …
distributed data centers is becoming one of the most critical network infrastructures. Due to …