Title: Traffic Engineering for ISP Networks
1Traffic Engineering for ISP Networks
- Jennifer Rexford
- Computer Science Department
- Princeton University
- http//www.cs.princeton.edu/jrex
2A Challenge in ISP Backbone Networks
- Finding a good way to route the data packets
- Given the current network topology and offered
traffic - For good performance and efficient use of
resources
3Why the Problem is Hard?
- IP traffic varies, and the service is best effort
- The offered traffic is not known in advance
- The resources in the network are not reserved
- The routers do not adapt on their own
- Load-sensitive routing is not widely deployed
- Due to control overhead and stability challenges
- Routing protocols were not designed to be managed
- At best indirect control over the flow of traffic
- Fine-grain traffic measurements often unavailable
- E.g., only have coarse-grain link load statistics
4In This Talk
- TE with traditional IP routing protocols
- Shortest-path protocols with configurable link
weights - Two main research challenges
- Optimization tuning link weights to the offered
traffic - Tomography inferring the offered traffic from
link load - Deployed solutions in ATTs U.S. backbone
- Our experiences working with the network
operators - And how we improved the tools over time
- Ongoing research on traffic management
5Optimization Tuning Link Weights
6Routing Inside an Internet Service Provider
- Routers flood information to learn the topology
- Routers determine next hop to reach other
routers - By computing shortest paths based on the link
weights - Routers forward packets via the next hop link(s)
2
1
3
1
3
2
1
5
4
3
7Link Weights Control the Flow of Traffic
- Routers compute paths
- Shortest paths as sum of link weights
- Operators set the link weights
- To control where the traffic goes
2
1
3
1
3
2
3
1
5
4
3
8Heuristics for Setting the Link Weights
- Proportional to physical distance
- Cross-country links have higher weights than
local ones - Minimizes end-to-end propagation delay
- Inversely proportional to link capacity
- Smaller weights for higher-bandwidth links
- Attracts more traffic to links with more capacity
- Tuned based on the offered traffic
- Network-wide optimization of weights based on
traffic - Directly minimizes key metrics like max link
utilization
9Why Are the Link Weights Static?
- Strawman alternative load-sensitive routing
- Link metrics based on traffic load
- Flood dynamic metrics as they change
- Adapt automatically to changes in offered load
- Reasons why this is typically not done
- Delay-based routing unsuccessful in the early
days - Oscillation as routers adapt to out-of-date
information - Most Internet transfers are very short-lived
- Research and standards work continues
- but operators have to work with what they have
10Big Picture Measure, Model, and Control
Network-wide what if model
Offered traffic
Topology/ Configuration
Changes to the network
measure
control
Operational network
11Traffic Engineering in an ISP Backbone
- Topology
- Connectivity and capacity of routers and links
- Traffic matrix
- Offered load between points in the network
- Link weights
- Configurable weights for shortest-path routing
- Performance objective
- Balanced load, low latency, service level
agreements - Question Given the topology and traffic matrix
in an IP network, which link weights should be
used?
12Key Ingredients of Our Approach
- Measurement
- Topology monitoring of the routing protocols
- Traffic matrix widely deployed traffic
measurement - Network-wide models
- Representations of topology and traffic
- What-if models of shortest-path routing
- Network optimization
- Efficient algorithms to find good configurations
- Operational experience to identify key
constraints
13Formalizing the Optimization Problem
- Input graph G(R,L)
- R is the set of routers
- L is the set of unidirectional links
- cl is the capacity of link l
- Input traffic matrix
- Mi,j is traffic load from router i to j
- Output setting of the link weights
- wl is weight on unidirectional link l
- Pi,j,l is fraction of traffic from i to j
traversing link l
14Multiple Shortest Paths With Even Splitting
Values of Pi,j,l
15Defining the Objective Function
- Computing the link utilization
- Link load ul Si,j Mi,j Pi,j,l
- Utilization ul/cl
- Objective functions
- min(maxl(ul/cl))
- min(Sl f(ul/cl))
16Complexity of the Optimization Problem
- NP-hard optimization problem
- No efficient algorithm to find the link weights
- Even for the simple convex objective functions
- Why cant we just do multi-commodity flow?
- E.g., solve the multi-commodity flow problem
- and the link weights pop out as the dual
- Because IP routers cannot split arbitrarily over
ties - What are the implications?
- Have to resort to searching through weight
settings
17Optimization Based on Local Search
- Start with an initial setting of the link weights
- E.g., same integer weight on every link
- E.g., weights inversely proportional to link
capacity - E.g., existing weights in the operational network
- Compute the objective function
- Compute the all-pairs shortest paths to get
Pi,j,l - Apply the traffic matrix Mi,j to get link loads
ul - Evaluate the objective function from the ul/cl
- Generate a new setting of the link weights
repeat
18Making the Search Efficient
- Avoid repeating the same weight setting
- Keep track of past values of the weight setting
- or keep a small signature (e.g., a hash) of
past values - Do not evaluate a weight setting if signatures
match - Avoid computing the shortest paths from scratch
- Explore weight settings that changes just one
weight - Apply fast incremental shortest-path algorithms
- Limit the number of unique values of link weights
- Do not explore all 216 possible values for each
weight - Stop early, before exploring the whole search
space
19Incorporating Operational Realities
- Minimize number of changes to the network
- Changing just 1 or 2 link weights is often enough
- Tolerate failure of network equipment
- Weights settings usually remain good after
failure - or can be fixed by changing one or two weights
- Limit dependence on measurement accuracy
- Good weights remain good, despite random noise
- Limit frequency of changes to the weights
- Joint optimization for day and night traffic
matrices
20Application to ATTs Backbone Network
- Performance of the optimized weights
- Search finds a good solution within a few minutes
- Much better than link capacity or physical
distance - Competitive with multi-commodity flow solution
- How ATT changes the link weights
- Maintenance done every night from midnight to 6am
- Predict effects of removing link(s) from the
network - Reoptimize the link weights to avoid congestion
- Configure new weights before disabling equipment
21Example from My Visit to ATTs Operations Center
- Amtrak repairing/moving part of the train track
- Need to move some of the fiber optic cables
- Or, heightened risk of the cables being cut
- Amtrak notifies us of the time the work will be
done - ATT engineers model the effects
- Determine which IP links go over the affected
fiber - Pretend the network no longer has these links
- Evaluate the new shortest paths and traffic flow
- Identify whether link loads will be too high
22Example Continued
- If load will be too high
- Reoptimize the weights on the remaining links
- Schedule the time for the new weights to be
configured - Roll back to the old weight setting after Amtrak
is done - Same process applied to other cases
- Assessing the networks risk to possible failures
- Planning for maintenance of existing equipment
- Adapting the link weights to installation of new
links - Adapting the link weights in response to traffic
shifts
23Conclusions on Traffic Engineering
- IP networks do not adapt on their own
- Routers compute shortest paths based on static
weights - Service providers need to adapt the weights
- Due to failures, congestion, or planned
maintenance - Leads to an interesting optimization problems
- Optimize link weights based on topology and
traffic - Optimization problem is computationally difficult
- Forces the use of efficient local-search
techniques - Results of the local search are pretty good
- Near-optimal solutions that minimize disruptions
24Extensions
- Robust link-weight assignments
- Link/node failures
- Range of traffic matrices
- More complex routing models
- Destinations reachable via multiple egress
points - Interdomain routing policies
- Interaction between ISPs
- Inter-ISP negotiation for joint optimization
- Grappling with scalability and trust issues
25Tomography Inferring the Traffic Matrix
26Computing the Traffic Matrix Mi,j
- Hard to measure the traffic matrix
- IP networks transmit data as individual packets
- Routers do not keep traffic statistics, except
link utilization on (say) a five-minute time
scale - Need to infer the traffic matrix Mi,j from
- Current topology G(R,L)
- Current routing Pi,j,l
- Current link load ul
- Link capacity cl
27Inference Network Tomography
From link counts to the traffic matrix
Sources
3Mbps
5Mbps
4Mbps
4Mbps
Destinations
28Tomography Formalizing the Problem
- Ingress-egress pairs
- p is a ingress-egress pair of nodes (i,j)
- xp is the (unknown) traffic volume for this pair
Mi,j - Routing
- Plp is proportion of ps traffic that traverses l
- Links in the network
- l is a unidirectional edge
- ul is the observed traffic volume on this link
- Relationship u Px (work backwards to get x)
29Tomography One Observation Not Enough
- Linear system of n nodes is underdetermined
- Number of links e is around O(n)
- Number of ingress-egress pairs c is O(n2)
- Dimension of solution sub-space at least c - e
- Multiple observations are needed
- k independent observations (over time)
- Stochastic model with Poisson iid counts
- Maximum likelihood estimation to infer matrix
- Doesnt work all that well in practice
30Approach Used at ATT Tomo-gravity
- Gravitational assumption
- Ingress point a has traffic via
- Egress point b has traffic veb
- Pair (a,b) has traffic proportional to via veb
9
20
21
10
31Approach Used at ATT Tomo-gravity
- Problem with gravity model
- Gravity model ignores the load on the inside
links - Gravity assumption isnt always 100 correct
- Resulting traffic matrix might not satisfy the
link loads - Combining the two techniques
- Gravity find a traffic matrix using the gravity
model - Tomography find the family of traffic matrices
consistent with all link load statistics - Tomo-gravity find the tomography solution that
is closest to the output of the gravity model - Works extremely well (and fast) in practice
32Conclusions
- Managing IP networks is challenging
- Routers dont adapt on their own to congestion
- Routers dont reveal much information about
traffic - Measurement provides a network-wide view
- Topology
- Traffic matrix
- Optimization enables the network to adapt
- Inferring the traffic matrix from the link loads
- Optimizing the link weights based on the traffic
matrix
33New Research Direction Design for Manage-ability
- Two main parts of network management
- Control optimization
- Measurement tomography
- Two research approaches
- Bottom up do the best with what you have
- Top down design systems that are easier to
manage - Design for manage-ability
- If you are both the professor and the student,
you create exam questions that are easy to
answer.
34Example Changing the Path Computation
- Routers split traffic over multiple paths
- More traffic on shorter paths, less on longer
ones - In proportion to the exponential of path cost
- Exciting result
- Can achieve optimal distribution of the traffic
- With polynomial-time algorithm for setting the
weights
35New Research Direction Logically-Central Control
- Traditional division of labor
- Routers real-time, distributed protocols
- Management system offline, centralized
algorithms - Example routing protocols and traffic
engineering - Routing routers react automatically to link
failures - TE management system sets the link weights
- The case for separating routing from routers
- Better decisions with network-wide visibility
- Routers only collect measurements and forward
packets
36Example Routing Control Platform (RCP)
- Logically-centralized server
- Collects measurement data from the network
- Pushes forwarding tables into the routers
- Benefits
- Network-wide policies
- Flexible, easy to customize
- Fewer nodes to upgrade
- Feasibility
- High-end PC can compute routes for large ISP
- Simple replication to survive failures
37References
- Traffic engineering using traditional protocols
- http//www.cs.princeton.edu/jrex/papers/ieeecomm0
2.pdf - http//www.cs.princeton.edu/jrex/papers/opthand04
.pdf - http//www.cs.princeton.edu/jrex/papers/ton-whati
f.pdf - Tomo-gravity to infer the traffic matrix
- http//www.cs.utexas.edu/yzhang/papers/mmi-ton05.
pdf - http//www.cs.utexas.edu/yzhang/papers/tomogravit
y-sigm03.pdf - http//www.cs.princeton.edu/jrex/papers/sfi.pdf
38References
- Design for manage-ability
- http//www.cs.princeton.edu/jrex/papers/pefti.pdf
- http//www.cs.princeton.edu/jrex/papers/optimizab
ility.pdf - http//www.cs.princeton.edu/jrex/papers/tie-long.
pdf - Routing Control Platform
- http//www.cs.princeton.edu/jrex/papers/rcp.pdf
- http//www.cs.princeton.edu/jrex/papers/ccr05-4d.
pdf - http//www.cs.princeton.edu/jrex/papers/rcp-nsdi.
pdf - http//www.research.att.com/kobus/docs/irscp.inm.
pdf