Title: Yaping Zhu
1Minimizing Wide-Area Performance Disruptions in
Inter-Domain Routing
- Yaping Zhu
- yapingz_at_cs.princeton.edu
- Advisor Prof. Jennifer Rexford
- Princeton University
2Minimize Performance Disruptions
- Network changes affect user experience
- Equipment failures
- Routing changes
- Network congestion
- Network operators have to react and fix problems
- Fix equipment failure
- Change route selection
- Change server selection
3Diagnosis Framework Enterprise Network
Diagnose
Measure network changes
Fix equipment, config, etc.
Full Control
Full Visibility
4Challenges to Minimize Wide-Area Disruptions
- The Internet is composed of many networks
- ISP (Internet Service Provider) provides
connectivity - CDN (Content Distribution Network) provides
services - Each network has limited visibility and control
Small ISPs
Large ISP
Client
CDN
5ISPs Challenge Provide Good Transit for Packets
- Limited visibility
- Small ISP lack of visibility into problem
- Limited control
- Large ISP lack of direct control to fix
congestion
Small ISPs
Large ISP
Client
CDN
6CDNs Challenge Maximize Performance for Services
- Limited visibility
- CDN cant figure out exact root cause
- Limited control
- CDN lack of direct control to fix problem
Small ISPs
Large ISP
Client
CDN
7Summary of Challenges of Wide-Area Diagnosis
- Measure large volume and diverse kinds of data
- Diagnosis today ad-hoc
- Takes a long time to get back to customers
- Does not scale to large number of events
Our Goal Build Systems for Wide-Area
Diagnosis Formalize and automate the diagnosis
process Analyze a large volume of measurement data
8Techniques and Tools for Wide-Area Diagnosis
Tool Problem Statement Results
Route Oracle Track route changes scalably for ISPs Deployed at ATT IMC09, PER10
NetDiag Diagnose wide-area latency increases for CDNs Deployed at Google In submission
9Rethink Routing Protocol Design
- Many performance problems caused by routing
- Route selection not based on performance
- 42.2 of the large latency increases in a large
CDN correlated with inter-domain routing changes - No support for multi-path routing
Our Goal Routing Protocol for Better
Performance Fast convergence to reduce
disruptions Route selection based on
performance Scalable multi-path to avoid
disruptions Less complexity for fewer errors
10Thesis Outline
Chapter Problem Statement Results
Route Oracle Track route changes scalably for ISPs Deployed at ATT IMC09, PER10
NetDiag Diagnose wide-area latency increases for CDNs Deployed at Google In submission
Next-hop BGP Routing protocol designed for better performance HotNets10 In submission to CoNext
11Route Oracle Where Have All the Packets Gone?
- Work with Jennifer Rexford
- Aman Shaikh and Subhabrata Sen
- ATT Research
12Route Oracle Where Have All the Packets Gone?
ATT IP Network
IP Packet
Egress Router
Ingress Router
- Inputs
- Destination IP Address
- When? Time
- Where? Ingress router
- Outputs
- Where leaving the network? Egress router
- Whats the route to destination? AS path
AS Path
Destination IP Address
13Application Service-Level Performance Management
ATT CDN Server in Atlanta
ATT
Leave ATT in Atlanta
Router in Atlanta
Leave ATT in Washington DC
Sprint
- Troubleshoot CDN throughput drop
- Case provided by ATT ICDS (Intelligent Content
Distribution Service) Project
Atlanta users
14Background IP Prefix and Prefix Nesting
- IP prefix IP address / prefix length
- E.g. 12.0.0.0 / 8 stands for 12.0.0.0,
12.255.255.255 - Suppose the routing table has routes for
prefixes - 12.0.0.0/8 12.0.0.0-12.255.255.255
- 12.0.0.0/16 12.0.0.0-12.0.255.255
- 12.0.0.0-12.0.255.255 covered by both /8 and
/16 prefix - Prefix nesting IPs covered by multiple prefixes
- 24.2 IP addresses are covered by more than one
prefix
15Background Longest Prefix Match (LPM)
- BGP update format
- by IP prefix
- egress router, AS path
- Longest prefix match (LPM)
- Routers use LPM to forward IP packets
- LPM changes as routes are announced and withdrawn
- 13.0 BGP updates cause LPM changes
Challenge determine the route for an IP
address -gt LPM for the IP address
-gt track LPM changes for the IP address
16Challenge Scale of the BGP Data
- Data collection BGP Monitor
- Have BGP session with each router
- Receive incremental updates of best routes
- Data Scale
- Dozens of routers (one per city)
- Each router has many prefixes (300K)
- Each router receives lots of updates (millions
per day)
Best routes
Software Router
Centralized Server
BGP Routers
17Background BGP is Incremental Protocol
- Incremental Protocol
- Routes not changed are not updated
- How to log routes for incremental protocol?
- Routing table dump daily
- Incremental updates 15mins
Daily table dump 15 mins updates
Best routes
Software Router
Centralized Server
BGP Routers
18Route Oracle Interfaces and Challenges
- Challenges
- Track longest prefix match
- Scale of the BGP data
- Need answer to queries
- At scale for many IP addresses
- In real time for network operation
BGP Routing Data
Inputs Destination IP Address
Ingress Router Time
Route Oracle
Outputs
Egress Router
AS Path
19Strawman Solution Track LPM Changesby
Forwarding Table
- How to implement
- Run routing software to update forwarding table
- Forwarding table answers queries based on LPM
- Answer query for one IP address
- Suppose n prefixes in routing table at t1, m
updates from t1 to t2 - Time complexity O(nm)
- Space complexity
- O(P) P stands for prefixes covering the query
IP address
20Strawman Solution Track LPM Changes by
Forwarding Table
- Answer queries for k IP addresses
- Keep all prefixes in forwarding table
- Space complexity O(n)
- Time complexity major steps
- Initialize n routes nlog(n)kn
- Process m updates mlog(n)km
- In sum (nm)(log(n)k)
- Goal reduce query processing time
- Trade more space for less time pre-processing
- Store pre-processed results not scale for 232
IPs - need to track LPM scalably
21Track LPM Scalably Address Range
- Prefix set
- Collection of all matching prefixes for given IP
address - Address range
- Contiguous addresses that have the same prefix
set - E.g. 12.0.0.0/8 and 12.0.0.0/16 in routing table
- 12.0.0.0-12.0.255.255 has prefix set /8, /16
- 12.1.0.0-12.255.255.255 has prefix set /8
- Benefits of address range
- Track LPM scalably
- No dependency between different address ranges
22Track LPM by Address Range Data Structure and
Algorithm
- Tree-based data structure node stands for
address range - Real-time algorithm for incoming updates
-
12.0.1.0-12.0.255.255
12.0.0.0-12.0.0.255
12.1.0.0-12.255.255.255
/8 /16 /24
/8 /16
/8
Routing Table
Prefix BGP Route
12.0.0.0/8
12.0.0.0/16
12.0.0.0/24
23Track LPM by Address Range Complexity
- Pre-processing
- for n initial routes in the routing table and m
updates - Time complexity (nm)log(n)
- Space complexity O(nm)
- Query processing for k queries
- Time complexity O((nm)k)
- Parallelization using c processors O((nm)k/c)
Strawman approach Route Oracle
Space complexity O(n) O(nm)
Pre-processing time O((nm)log(n))
Query time O((nm)(log(n)k)) O((nm)k)
Query parallelization No Yes
24Route Oracle System Implementation
BGP Routing Data Daily table dump,
15 mins updates
Precomputation
Daily snapshot of routes by address ranges
Incremental route updates for address ranges
Query Inputs Destination IP Ingress router Time
Query Processing
Output for each query Egress router, AS path
25Query Processing Optimizations
- Optimize for multiple queries
- Amortize the cost of reading address range
records across multiple queried IP addresses - Parallelization
- Observation address range records could be
processed independently - Parallelization on multi-core machine
26Performance Evaluation Pre-processing
- Experiment on SMP server
- Two quad-core Xeon X5460 Processors
- Each CPU 3.16 GHz and 6 MB cache
- 16 GB of RAM
- Experiment design
- BGP updates received over fixed time-intervals
- Compute the pre-processing time for each batch of
updates - Can we keep up? pre-processing time
- 5 mins updates 2 seconds
- 20 mins updates 5 seconds
27Performance Evaluation Query Processing
- Query for one IP (duration 1 day)
- Route Oracle 3-3.5 secs Strawman approach
minutes - Queries for many IPs scalability (duration 1
hour)
28Performance Evaluation Query Parallelization
29Conclusion
Challenges Contributions
1 Prefix nesting LPM changes Introduce address range Track LPM changes scalably for many IPs
2 Scale of BGP data Tree based data structure Real-time algorithm for incoming updates
3 Answer queries at scale and in real time Pre-processing more space for less time Amortize the processing for multiple queries Parallelize query processing
30NetDiag Diagnosing Wide-Area Latency Changes for
CDNs
- Work with Jennifer Rexford
- Benjamin Helsley, Aspi Siganporia, and Sridhar
Srinivasan Google Inc.
31Background CDN Architecture
- Life of a client request
- Front-end (FE) server selection
- Latency map
- Load balancing (LB)
Ingress Router
Front-end Server (FE)
CDN Network
Client
AS Path
Egress Router
32Challenges
- Many factors contribute to latency increase
- Internal factors
- External factors
- Separate cause from effect
- e.g., FE changes lead to ingress/egress changes
- The scale of a large CDN
- Hundreds of millions of users, grouped by ISP/Geo
- Clients served at multiple FEs
- Clients traverse multiple ingress/egress routers
33Contributions
- Classification
- Separating cause from effect
- Identify threshold for classification
- Metrics analyze over sets of servers and routers
- Metrics for each potential cause
- Metrics by an individual router or server
- Characterization
- Events of latency increases in Googles CDN
(06/2010)
34Background Client Performance Data
Ingress Router
Performance Data
Front-end Server (FE)
CDN Network
Client
AS Path
Egress Router
Performance Data Format IP prefix, FE, Requests
Per Day (RPD), Round-Trip Time (RTT)
35Background BGP Routing and Netflow Traffic
- Netflow traffic (at edge routers) 15 mins by
prefix - Incoming traffic ingress router, FE, bytes-in
- Outgoing traffic egress router, FE, bytes-out
- BGP routing (at edge routers) 15 mins by prefix
- Egress router and AS path
36Background Joint Data Set
- Granularity
- Daily
- By IP prefix
- Format
- FE, requests per day (RPD), round-trip time (RTT)
- List of ingress router, bytes-in
- List of egress router, AS path, bytes-out
BGP Routing Data
Netflow Traffic Data
Performance Data
Joint Data Set
37Classification of Latency Increases
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
Performance Data
FE Changes
Group by Region
Identify Events
FE Change vs. FE Latency Increase
BGP Routing Netflow Traffic
Events
FE Latency Increase
Routing Changes Ingress Router vs. Egress
Router, AS path
38Case Study Flash Crowd Leads some Requests to a
Distant Front-End Server
- Identify event RTT doubled for an ISP in
Malaysia - Diagnose follow the decision tree
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
97.9 by FE changes
32.3 FE change By load balancing
FE Change vs. FE Latency Increase
RPD (requests per day) jumped
RPD2/RPD1 2.5
39Classification FE Server and Latency Metrics
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
Performance Data
FE Changes
Group by Region
Identify Events
FE Change vs. FE Latency Increase
BGP Routing Netflow Traffic
Events
FE Latency Increase
Routing Changes Ingress Router vs. Egress
Router, AS path
40FE Change vs. FE Latency Increase
- RTT weighted by requests from FEs
- Break down RTT change by two factors
- FE change
- Clients switch from one FE to another (with
higher RTT) - FE latency change
- Clients using the same FE, latency to FE increases
41FE Change vs. FE Latency Change Breakdown
- FE change
- FE latency change
- Important properties
- Analysis over a set of FEs
- Sum up to 1
42FE Changes Latency Map vs. Load Balancing
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
Performance Data
FE Changes
Group by Region
Identify Events
FE Change vs. FE Latency Increase
BGP Routing Netflow Traffic
Events
FE Latency Increase
Routing Changes Ingress Router vs. Egress
Router, AS path
43FE Changes Latency Map vs. Load Balancing
- Classify FE changes by two metrics
- Fraction of traffic shift by latency map
- Fraction of traffic shift by load balancing
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
FE Changes
FE Change vs. FE Latency Increase
44Latency Map Closest FE Server
- Calculate latency map
- Latency map format (prefix, closest FE)
- Aggregate by groups of clients
- list of (FEi, ri)
- ri fraction of requests directed to FEi by
latency map - Define latency map metric
45Load Balancing Avoiding Busy Servers
- FE request distribution change
- Fraction of requests shifted by the load balancer
- Sum only if positive target request load gt
actual load - Metric more traffic load balanced on day 2
46FE Latency Increase Routing Changes
- Correlate with routing changes
- Fraction of traffic shifted ingress router
- Fraction of traffic shifted egress router, AS
path
FE hange vs. FE Latency Increase
BGP Routing Netflow Traffic
FE Latency Increase
Routing Changes Ingress Router vs. Egress
Router, AS path
47Routing Changes Ingress, Egress, AS Path
- Identify the FE with largest impact
- Calculate fraction of traffic which shifted
routes - Ingress router
- f1j, f2j fraction of traffic entering ingress j
on days 1 and 2 - Egress router and AS path
- g1k, g2k fraction of traffic leaving egress/AS
path k on day 1, 2
48Identify Significant Performance Disruptions
Latency Map FE Capacity and Demand
Latency Map Change vs. Load Balancing
Performance Data
FE Changes
Group by Region
Identify Events
FE Change vs. FE Latency Increase
BGP Routing Netflow Traffic
Events
FE Latency Increase
Routing Changes Ingress Router vs. Egress
Router, AS path
49Identify Significant Performance Disruptions
- Focus on large events
- Large increases gt 100 msec, or doubles
- Many clients for an entire region (country/ISP)
- Sustained period for an entire day
- Characterize latency changes
- Calculate daily latency changes by region
Event Category Percentage
Latency Increase by more than 100 msec 1
Latency more than doubles 0.45
50Latency Characterization for Googles CDN
- Apply the classification to one month of data
(06/2010)
Category Events
FE latency increase 73.9
Ingress router 10.3
(Egress router, AS path) 14.5
Both 17.4
Unknown 31.5
FE server change 34.7
Latency map 14.2
Load balancing 2.9
Both 9.3
Unknown 8.4
Total 100.0
51Conclusion and Future Work
- Conclusion
- Method for automatic classification of latency
increases - Tool deployed at Google since 08/2010
- Future work
- More accurate diagnosis on smaller timescale
- Incorporate active measurement data
52Putting BGP on the Right Path Better Performance
via Next-Hop Routing
- Work with Michael Schapira, Jennifer Rexford
- Princeton University
53Motivation Rethink BGP Protocol Design
- Many performance problems caused by routing
- Slow convergence during path exploration
- Path selection based on AS path length, not
performance - Selecting a single path, rather than multiple
- Vulnerability to attacks that forge the AS-PATH
Many performance problems related to Routing
decision based on AS path length, not performance
54Next-Hop Routing for Better Performance
- Control plane path-based routing -gt next-hop
routing - Fast convergence through less path exploration
- Scalable multipath routing without exposing all
paths - Data plane performance and security
- Path selection based on performance
- Reduced attack surface without lying on AS-PATH
55Todays BGP Path-Based Routing
3, Im using 1d
32d gt 31d
1
1, 2, Im available
3
d
2
Dont export 2d to 3
56Background BGP Decision Process
- Import policy
- Decision process
- Prefer higher local preference
- Prefer shorter AS path length
- etc.
- Export policy
Choose single bestroute (ranking)
Send route updates to neighbors (export policy)
Receive route updates from neighbors
57Next-hop Routing Rules
- Rule 1 use next-hop rankings
541d gt 53d gt 542d
4 gt 3
1
4
d
5
2
3
58Next-hop Routing Rules
- Rule 1 use next-hop rankings
- Rule 2 prioritize current route
- To minimize path exploration
23 Break ties in favor of lower AS number
23 Prioritize current route
2
d
1
3
59Next-hop Routing Rules
- Rule 1 use next-hop rankings
- Rule 2 prioritize current route
- Rule 3 consistently export
- If a route P is exportable to a neighbor AS i,
then so must be all routes that are more highly
ranked than P. - To avoid disconnecting upstream nodes
1 gt 2, Export 32d, but not 31d, to 4
1 gt 2, Export 31dto 4
1
d
3
4
2
60Next-Hop Routing for Better Performance
- Control plane
- Fast convergence
- Scalable multipath routing
- Data plane
- Performance-driven routing
- Reduced attack surface
61Simulation Setup
- C-BGP simulator. Cyclops AS-level topology
- Jan 1st 2010 34.0k ASes, 4.7k non-stubs
- Protocols
- BGP, Prefer Recent Route (PRR), Next-hop routing
- Metrics
- updates, routing changes, forwarding
changes - Events
- prefix up, link failure, link recovery
- Methodology
- 500 experiments
- vantage points all non-stubs, randomly chosen 5k
stubs
62Fast Convergence Updates
- X-axis updates after a link failure
- Y-axis Fraction of non-stubs with more than x
updates
63Fast Convergence Routing Changes
- X-axis routing changes after a link failure
- Y-axis Fraction of non-stubs with more than x
changes
64Next-Hop Routing for Better Performance
- Control plane
- Fast convergence
- Scalable multipath routing
- Data plane
- Performance-driven routing
- Reduced attack surface
65Multipath with Todays BGP Not Scalable
Im using 1 and 2
5
1
Im using 5-1, 5-2, 6-3 and 6-4
2
d
7
8
3
Im using 3 and 4
4
6
66Making Multipath Routing Scalable
- Benefits availability, failure recovery, load
balancing
Im using 1,2
5
1
Im using 1,2,3,4,5,6
2
d
7
8
3
Im using 3,4
4
6
67Next-Hop Routing for Better Performance
- Control plane
- Fast convergence
- Scalable multipath routing
- Data plane
- Performance-driven routing
- Reduced attack surface
68Performance-driven Routing
- Next-hop routing can lead to longer paths
- Evaluate across events prefix up, link
failure/recovery - 68.7-89.9 ASes have same path length
- Most other ASes experience one extra hop
- Decision based on measurements of path quality
- Performance metrics throughput, latency, or loss
- Adjust ranking of next-hop ASes
- Split traffic over multiple next-hop ASes
69Monitoring Path Performance Multi-homed Stub
- Apply existing techniques
- Intelligent route control supported by routers
- Collect performance measurements
- Stub AS see traffic in both directions forward,
reverse
Provider A
Provider B
Multi-home Stub
70Monitoring Path Performance Service Provider
- Monitor end-to-end performance for clients
- Collect logs at servers e.g. round-trip time
- Explore alternate routes route injection
- Announce more-specific prefix
- Direct a small portion of traffic on alternate
path - Active probing on alternate paths
71Monitoring Path Performance ISPs
- Challenges
- Most traffic does not start or end in ISP network
- Asymmetric routing
- Focus on single-homed customers
- Why single-homed?
- See both directions of the traffic
- How to collect passive flow measurement
selectively? - Hash-based sampling
72Next-Hop Routing for Better Performance
- Control plane
- Fast convergence
- Scalable multipath routing
- Data plane
- Performance-driven routing
- Reduced attack surface
73Security
- Reduced attack surface
- Attack announce shorter path to attract traffic
- Next-hop routing AS path not used, cannot be
forged! - Incentive compatible
- Definition AS cannot get a better next-hop by
deviating from protocol (e.g. announce bogus
route, report inconsistent information to
neighbors) - Theorem 1 ASes do not have incentives to
violate the next-hop routing protocol - End-to-end security mechanisms
- Not rely on BGP for data-plane security
- Use encryption, authentication, etc.
1 J. Feigenbaum, V. Ramachandran, and M.
Schapira, Incentive-compatible interdomain
routing, in Proc. ACM Electronic Commerce, pp.
130-139, 2006.
74Conclusion
- Next-hop routing for better performance
- Control-plane fast convergence, scalable
multipath - Data-plane performance-driven routing, less
attacks - Future work
- Remove the AS path attribute entirely
- Stability and efficiency of performance-driven
routing
75Conclusion
Chapter Contributions Results
Route Oracle Analysis of prefix nesting, LPM changes Track LPM scalably by address range System implementation with optimizations Deployed at ATT IMC09, PER10
NetDiag Classification for causes of latency increases Metrics to analyze sets of servers and routers Latency characterization for Googles CDN Deployed at Google In submission
Next-hop BGP Proposal of BGP variant by next-hop routing Evaluate better convergence Scalable multi-path performance-driven routing HotNets10 In submission to CoNext