Title: Network Control and Management in the 100x100 Architecture
1 Network Control and Managementin the 100x100
Architecture
2The Role of Network Control and Management
- Many different network environments
- Data center networks, enterprise/campus
- Access, backbone networks
- Many different technologies
- Longest-prefix routing, label switching,
switching - IP, MPLS, ATM, optical circuits
- Many different policies
- Routing, reachability, transit, traffic
engineering, robustness - The control plane software binds these elements
together and defines the network
3Control Plane The Key Leverage Point
- Great Potential control plane determines the
behavior of the network - Reaction to events, reachability, services
- Great Opportunities
- A radical clean-slate control plane can be
deployed - Agnostic to packet format IPv4/v6, ethernet
- No changes to end-system software
- Control plane is the nexus of network evolution
- Changing the control plane logic can smooth
transitions in network technologies and
architectures
4100x100 Project Themes
5A Clean-slate Design
- What are the fundamental causes of outages?
- How to reduce/simplify the software in networks?
- Control logic is software no reason it should
be hard to update, but how to avoid complexity
pitfalls - What functionality needs to be distributed what
can be centralized? - What would a RISC router look like?
- Leverage technology trends
- CPU and link-speed growing faster than of
switches
FIX ME
6Three Principles forNetwork Control Management
- Network-level Objectives
- Express goals explicitly
- Security policies, QoS, egress point selection
- Do not bury goals in box-specific configuration
Reachability matrix Traffic engineering rules
Management Logic
7Three Principles forNetwork Control Management
- Network-wide Views
- Design network to provide timely, accurate info
- Topology, traffic, resource limitations
- Give logic the inputs it needs
Reachability matrix Traffic engineering rules
Management Logic
Read state info
8Three Principles forNetwork Control Management
- Direct Control
- Allow logic to directly set forwarding state
- FIB entries, packet filters, queuing parameters
- Logic computes desired network state, let it
implement it
Reachability matrix Traffic engineering rules
Write state
Management Logic
Read state info
9Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Decision Plane
- All management logic implemented on centralized
servers making all decisions - Decision Elements use views to compute data plane
state that meets objectives, then directly writes
this state to routers
10Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Dissemination Plane
- Provides a robust communication channel to each
router and robustness is the only goal! - May run over same links as user data, but
logically separate and independently controlled
11Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Discovery Plane
- Each router discovers its own resources and its
local environment - E.g., the identity of its immediate neighbors
12Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Data Plane
- Spatially distributed routers/switches
- Ideally exposes interface to tables in hardware
- Can deploy with todays technology
13Concerns and Challenges
- How does the 4D simplify the problem?
- How will communication between routers and DEs
survive failures in the network? - Can a robust dissemination plane be built?
- Latency means DEs view of network is behind
reality. Will the control loop be stable? - What is the overhead to/from the DEs?
- What happens in a network partition?
FIX ME
14Fundamental Problem Wrong Abstractions
Shell scripts
Traffic Eng
- Management Plane
- Figure out what is happening in network
- Decide how to change it
Planning tools
Databases
Configs
SNMP
netflow
modems
OSPF
- Control Plane
- Multiple routing processes on each router
- Each router with different configuration program
- Huge number of control knobs metrics, ACLs,
policy
Link metrics
Routing policies
FIB
- Data Plane
- Distributed routers
- Forwarding, filtering, queueing
- Based on FIB or labels
FIB
FIB
Packet filters
15Good Abstractions Reduce Complexity
Management Plane
Configs
Decision Plane
Control Plane
FIBs, ACLs
FIBs, ACLs
Dissemination
Data Plane
Data Plane
- All decision making logic lifted out of control
plane - Eliminates duplicate logic in management plane
- Dissemination plane provides robust communication
to/from data plane switches
164D Separates Distributed Computing Issues from
Networking Issues
- Distributed computing issues ! protocols and
network architecture - Overhead
- Resiliency
- Scalability
- Networking issues ! management logic
- Traffic engineering and service provisioning
- Egress point selection
- Reachability control (VPNs)
- Precomputation of backup paths
174D Can Leverage Network Structure
- Decision plane logic can be specialized for
structure of each physical network - Distributed protocols must be prepared for
arbitrary topology graphs - 4D enables network logic specialized differently
for access and for backbone - Advantages
- Faster route computations
- Retain flexibility to evolve network as needed
- Support transition to 100x100 architecture
18The Feasibility of the 4D Architecture
- We designed and built a prototype of the 4D
Architecture - 4D Architecture permits many designs prototype
is a single, simple design point - Decision plane
- Contains logic to simultaneously compute routes
and enforce reachability matrix - Multiple Decision Elements per network, using
simple election protocol to pick master - Dissemination plane
- Uses source routes to direct control messages
- Extremely simple, but can route around failed
data links
19Evaluation of the 4D Prototype
- Evaluated using Emulab (www.emulab.net)
- Linux PCs used as routers (650 800MHz)
- Tested on 9 enterprise network topologies
(10-100 routers each)
Example network with 49 switches and 5 DEs
20Performance of the 4D Prototype
- Trivial prototype has performance comparable to
well-tuned production networks - Recovers from single link failure in lt 300 ms
- lt 1 s response considered excellent
- Survives failure of master Decision Element
- New DE takes control within 1 s
- No disruption unless second fault occurs
- Gracefully handles complete network partitions
- Less than 1.5 s of outage
21Future Work
- Scalability
- Evaluate over 1-10K switches, 10-100K routes
- Networks with backbone-like propagation delays
- Structuring decision logic
- Arbitrate among multiple, potentially competing
objectives - Unify control when some logic takes longer than
others - Protocol improvements
- Better dissemination and discovery planes
- Deployment in todays networks
- Data center, enterprise, campus, backbone (RCP)
22Themes of Network Control Management
- Holistic Design
- Many different technologies a few common
problems - Find the right abstractions exploit commonality
- Clean Slate
- How much autonomy do routers/switches need?
- New principles for controlling networks
- Eliminate duplicate logic
- Leverage Network Structure
- Many different types of networks exist - each
with different objectives and topologies - Separate networking issues from distributed
system issues
23Recent Results
- G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A.
Greenberg, G. Hjalmtysson, J. Rexford, On Static
Reachability Analysis of IP Networks, IEEE
INFOCOM 2005, Orlando, FL, March 2005. - J. Rexford, A. Greenberg, G. Hjalmtysson, D. A.
Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang,
Network-Wide Decision Making Toward a
Wafer-Thin Control Plane, Proceedings of ACM
HotNets-III, San Diego, CA, November 2004. - D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A.
Greenberg, H. Zhang, Routing Design in
Operational Networks A Look from the Inside,
Proceedings of the 2004 Conference on
Applications, Technologies, Architectures, and
Protocols for Computer Communications (ACM
SIGCOMM 2004), Portland, Oregon, 2004. - D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G.
Hjalmtysson, A. Greenberg, J. Rexford, Structure
Preserving Anonymization of Router Configuration
Data, Proceedings of ACM/Usenix Internet
Measurement Conference (IMC 2004), Sicily, Italy,
2004.
24Questions?
25Fundamental Problem Wrong Abstractions
- interface Ethernet0
- ip address 6.2.5.14 255.255.255.128
- interface Serial1/0.5 point-to-point
- ip address 6.2.2.85 255.255.255.252
- ip access-group 143 in
- frame-relay interface-dlci 28
- router ospf 64
- redistribute connected subnets
- redistribute bgp 64780 metric 1 subnets
- network 66.251.75.128 0.0.0.127 area 0
- router bgp 64780
- redistribute ospf 64 match route-map
8aTzlvBrbaW - neighbor 66.253.160.68 remote-as 12762
- neighbor 66.253.160.68 distribute-list 4 in
access-list 143 deny 1.1.0.0/16 access-list 143
permit any route-map 8aTzlvBrbaW deny 10 match
ip address 4 route-map 8aTzlvBrbaW permit 20
match ip address 7 ip route 10.2.2.1/16 10.2.1.7
26Fundamental Problem Wrong Abstractions
2000
Size of configuration files in a single
enterprise network (881 routers)
Lines in config file
1000
0
881
0
Router ID (sorted by file size)
27Fundamental Problem Wrong Abstractions
Shell scripts
Traffic Eng
- Management Plane
- Figure out what is happening in network
- Decide how to change it
Planning tools
Databases
Configs
SNMP
netflow
modems
OSPF
- Control Plane
- Multiple routing processes on each router
- Each router with different configuration program
- Huge number of control knobs metrics, ACLs,
policy
Link metrics
Routing policies
FIB
- Data Plane
- Distributed routers
- Forwarding, filtering, queueing
- Based on FIB or labels
FIB
FIB
Packet filters
28Good Abstractions Reduce Complexity
Management Plane
Configs
Decision Plane
Control Plane
FIBs, ACLs
FIBs, ACLs
Dissemination
Data Plane
Data Plane
- All decision making logic lifted out of control
plane - Eliminates duplicate logic in management plane
- Dissemination plane provides robust communication
to/from data plane switches
29Fundamental Problem Conflating Distributed
Systems Issues with Networking Issues
Routing Process
D left
D
D
Routing Process
Routing Process
D
D
D left
D left
- Distributed Systems Concern resiliency to link
failures - Solution multiple paths through routing process
graph
30Fundamental Problem Conflating Distributed
Systems Issues with Networking Issues
Routing Process
D right
D
Routing Process
Routing Process
D
D
D left
D left
- Distributed Systems Concern resiliency to link
failures - Solution multiple paths through routing process
graph
31Fundamental Problem Conflating Distributed
Systems Issues with Networking Issues
Routing Process
D left
D
D
Routing Process
Routing Process
D
D
D left
D left
- Networking Concern implement resource or
security policy - Solution restrict flow of routing information,
filter routes, summarize/aggregate routes
324D Separates Distributed Computing Issues from
Networking Issues
- Distributed computing issues ! protocols and
network architecture - Overhead
- Resiliency
- Scalability
- Networking issues ! management logic
- Traffic engineering and service provisioning
- Egress point selection
- Reachability control (VPNs)
- Precomputation of backup paths
334D Can Leverage Network Structure
- Decision plane logic can be specialized for
structure of each physical network - Distributed protocols must be prepared for
arbitrary topology graphs - 4D enables network logic specialized differently
for access and for backbone - Advantages
- Faster route computations
- Retain flexibility to evolve network as needed
- Support transition to 100x100 architecture
34Fundamental Problem Computing Configurations is
Intractable
- Computing configuration files that cause control
plane to compute desired forwarding states is
intractable - NP-hard in many cases
- Requires predictive model of control plane
behavior - Configurations files form a program that defines
a set of forwarding states - Very hard to create program that permits only
desired states, and doesnt transit through bad
ones
Forwarding states allowed by configs
Auto-adaptation leads to/thru bad states
Planned responses avoid bad states
35Direct Control Provides Complete Control
- Zero device-specific configuration
- Supports many models for pushing routes
- Trivial push convergence requires time for all
updates to be receive and applied same as today - Synchronized update updates propagated, but not
applied till agreed time in the future clock
skew defines convergence time - Controlled state trajectory DE serializes
updates to avoid all incorrect transient states
364D and Todays Networks
- 4D architecture and principles apply to todays
networks as well as 100x100 - Enterprise/campus/university networks
- Data center networks
- Access/backbone networks
- Greater expressivity in determining behavior
- Behavior of butterfly graph gadgets under failure
- Selection of traffic egress points
374D Supports Network Evolution Expansion
- Decision logic can be upgraded as needed
- No need for update of distributed protocols
implemented in software distributed on every
switch - Decision Elements can be upgraded as needed
- Network expansion requires upgrades only to DEs,
not every switch
38Three Key Questions
- Is there any transition path to deploy the 4D
architecture? - Is the 4D architecture feasible?
- Does the 4D architecture have more expressive
power than todays approaches to network control
and management?
39Deployment of the 4D Architecture
- Pre-existing industry trend towards separating
router hardware from software - IETF FORCES, GSMP, GMPLS
- SoftRouter Lakshman, HotNets04
- Incremental deployment path exists
- Individual networks can upgrade to 4D and gain
benefits - Small enterprise networks have most to gain
- No changes to end-systems required
40Reachability Example
R1
R2
Chicago (chi)
New York (nyc)
Data Center
Front Office
R5
R4
R3
- Two locations, each with data center front
office - All routers exchange routes over all links
41Reachability Example
R1
R2
Chicago (chi)
New York (nyc)
Data Center
Front Office
R5
R4
R3
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
42Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
43Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- A new short-cut link added between data centers
- Intended for backup traffic between centers
44Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- Oops new link lets packets violate security
policy! - Routing changed, but
- Packet filters dont update automatically
45Prohibiting Packets from chi-FO to nyc-DC
46Reachability Example
Packet filter Drop nyc-FO -gt Permit
R2
R1
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- Typical response add more packet filters to
plug the holes in security policy
47Reachability Example
Drop nyc-FO -gt
R2
R1
chi
Data Center
Front Office
R5
nyc
Drop chi-FO -gt
R4
R3
- Packet filters have surprising consequences
- Consider a link failure
- chi-FO and nyc-FO still connected
48Reachability Example
Drop nyc-FO -gt
R2
R1
chi
Data Center
Front Office
R5
nyc
Drop chi-FO -gt
R4
R3
- Network has less survivability than topology
suggests - chi-FO and nyc-FO still connected
- But packet filter means no data can flow!
- Probing the network wont predict this problem
49Allowing Packets from chi-FO to nyc-FO
50(No Transcript)
51(No Transcript)
52Packet Filters Implement Policy
- Packet filters used extensively throughout
networks - Protect routers from attack
- Implement reachability matrix
- Define which hosts can communicate
- Localize traffic, particularly multicast
53Mechanisms for Action at a Distance
A
Routing Process
Routing Process
Routing Process
Atag12
Atag12
Tag?
FIB
FIB
FIB
R1
R2
R3
- Policy often implemented by tagging routes on one
router - And testing for tag at another router
54Multiple Interacting Routing Processes
Client
Server
55The Routing Instance Graph of a 881 Router
Network
56Reconvergence Time UnderSingle Link Failure
57Reconvergence Time When Master DE Crashes
58Reconvergence Time WhenNetwork Partitions
59Reconvergence Time WhenNetwork Partitions
60Systems of Systems
- Systems are designed as components to be used in
larger systems in different contexts, for
different purposes, interacting with different
components - Example OSPF and BGP are complex systems in its
own right, they are components in a routing
system of a network, interacting with each other
and packet filters, interacting with management
tools - Complex configuration to enable flexibility
- The glue has tremendous impact on network
performance - State of art multiple interactive distributed
programs written in assembly language - Lack of intellectual framework to understand
global behavior
61Many Implementations Possible
Single redundant decision engine
- Multiple decision engines
- Hot stand-by
- Divide network load share
- Distributed decision engines
- Up to one per router
- Choice can be based on reliability requirements
- Dessim. Plane can be in-band, or leverage OOB
links - Less need for distributed solutions (harder to
reason about) - More focus on network issues, less on distributed
protocols
62Direct Expression Enables New Algorithms
D
- OSPF normally calculates a single path to each
destination D - OSPF allows load-balancing only for equal-cost
paths to avoid loops - Using ECMP requires careful engineering of link
weights
D
- Decision Plane with network-wide view can compute
multiple paths - Backup paths installed for free!
- Bounded stretch, bounded fan-in
63Slides under Development
64Supporting Network Evolution
- Logic for controlling the network needs to change
over time - Traffic engineering rules
- Interactions with other networks
- Service characteristics
- Upgrades to field-deployed network equipment must
be avoided - Very high cost
- Software upgrades often require hardware upgrades
(more CPU or memory)
65Supporting Network EvolutionToday
- Todays Solution
- Vendors stuff their routers with software
implementing all possible features - Multiple routing protocols
- Multiple signaling protocols (RSVP, CR-LDP)
- Each feature controlled by parameters set at
configuration time to achieve late binding - Feature-creep creates configuration nightmare
- Tremendous complexity for syntax semantics
- Mis-interactions between features is common
- Our Goal Separate decision making logic from the
field-deployed devices
66Supporting Network Expansion
- Networks are constantly growing
- New routers/switches/links added
- Old equipment rarely removed
- Adding a new switch can cause old equipment to
become overloaded - CPU/Memory demands on each device should not
scale up with network size
67Supporting Network ExpansionToday
- Routers run a link-state routing protocol
- Size of link-state database scales with of
routers - Expanding network can exceed memory limits of old
routers - Todays Solution
- Monitor resources on all routers
- Predict approach of exhaustion and then
- Global upgrade
- Rearchitecture of routing design to add
summarization, route aggregation, information
hiding - Our Goal make demands scale with hardware (e.g.,
of interfaces)
68Supporting Remote Devices
- Maintaining communication with all network
devices is critical for network management - Diagnosis of problems
- Monitoring status and network health
- Updating configuration or software
- the chicken or the egg.
- Cannot send device configuration/management
information until it can communicate - Device cannot communicate until it is correctly
configured
69Supporting Remote DevicesToday
- Todays Solution
- Use PSTN as management network of last resort
- Connect console of remote routers to phone modem
- Cant be used for customer premise equipment
(CPE) DSL/cable modems, integrated access
devices (IADs) - In a converged network, PSTN is decommissioned
- Our Goal Preserve management communication to
any device that is not physically partitioned,
regardless of configuration state
70Network Control and Management Today
- State everywhere!
- Dynamic state in FIBs
- Configured state in settings, policies, packet
filters - Programmed state in magic constants, timers
- Many dependencies between bits of state
- State updated in uncoordinated, decentralized way!
- Data Plane
- Distributed routers
- Forwarding, filtering, queueing
- Based on FIB or labels
Packet filters
71Network Control and Management Today
- Logic everywhere!
- Path Computation built into routing protocols
- Routing Policy distributed across the routers
- Packet Filters placed by tools in Mng. Plane
- No way to arbitrate inconsistencies between logic!
- State everywhere!
- Dynamic state in FIBs
- Configured state in settings, policies, packet
filters - Programmed state in magic constants, timers
- Many dependencies between bits of state
- State updated in uncoordinated, decentralized way!
- Data Plane
- Distributed routers
- Forwarding, filtering, queueing
- Based on FIB or labels
Packet filters
72A Study of Operational Production Networks
- How complicated/simple are real control planes?
- What is the structure of the distributed system?
- Use reverse-engineering methodology
- There are few or no documents
- The ones that exist are out-of-date
- Anonymized configuration files for 31 active
networks (gt8,000 configuration files) - 6 Tier-1 and Tier-2 Internet backbone networks
- 25 enterprise networks
- Sizes between 10 and 1,200 routers
- 4 enterprise networks significantly larger than
the backbone networks
73Learning from Ethernet Evolution Experience
Current Implementations Everything Changed
Except Name and Framing
HUB Switch
Router
WAN
- Switched solution
- Little use for collision domains
- Servers, routers 10 x station speed
- 10/100/1000 Mbps, 10gig coming Copper, Fiber
Ethernet Conc..
Server
74Ethernet Re-inventing the Wheel
- Becoming as service-rich and complex as IP
- Traffic engineering
- Reachability control and traffic isolation
(VLANs) - QoS (802.1q)
- Ethernet networks rediscovering the problems and
solutions faced by IP networks - Is there commonality to exploit?
- Switch/routers are all fundamentally table-driven
- Destination addr, MPLS labels, VLANs, Circuit IDs
75Control/Management Needs of100x100 Network
Architecture
- Control/Management creates logical network from
physical network - Supports architecture and end-to-end view of
100x100 network - Access Network
- Logical level aggregation tree between CPE and
Regional Node - Physical level network with redundant links and
multiple Regional Nodes - Backbone Network
- Logical level full mesh of links among Regional
Nodes - Physical level sparse graph of fiber routes
constrained by geography