Title: Tomographybased Overlay Network Monitoring and its Applications
 1Tomography-based Overlay Network Monitoring and 
its Applications
Yan Chen
- Joint work with David Bindel, Brian Chavez, 
 Hanhee Song, and Randy H. Katz
- UC Berkeley
2Motivation
- Applications of end-to-end distance monitoring 
- Overlay routing/location 
- Peer-to-peer systems 
- VPN management/provisioning 
- Service redirection/placement 
- Cache-infrastructure configuration 
- Requirements for E2E monitoring system 
- Scalable  efficient small amount of probing 
 traffic
- Accurate capture congestion/failures 
- Incrementally deployable 
- Easy to use
3Existing Work
- Static estimation 
- Global Network Positioning (GNP) 
- Dynamic monitoring 
- Loss rates RON (n2 measurement) 
- Latency IDMaps, Dynamic Distance Maps, Isobar 
- Latency similarity under normal conditions 
 doesnt imply similar losses !
- Network tomography 
- Focusing on inferring the characteristics of 
 physical links rather than E2E paths
- Limited measurements -gt under-constrained system, 
 unidentifiable links
4Problem Formulation
- Given n end hosts on an overlay network and O(n2) 
 paths, how to select a minimal subset of paths to
 monitor so that the loss rates/latency of all
 other paths can be inferred.
- Key idea select a basis set of k paths that 
 completely describe all O(n2) paths (k O(n2))
- Select and monitor k linearly independent paths 
 to compute the loss rates of basis set
- Infer the loss rates of all other paths 
- Applicable for any additive metrics, like latency
5Path Matrix and Path Space
- Path loss rate p, link loss rate l 
- Totally s links, path vector v 
6Sample Path Matrix
- x1 - x2 unknown gt cannot compute x1, x2 
- Set of vectors 
-  form null space 
- To separate identifiable vs. unidentifiable 
 components x  xG  xN
- All E2E paths are in path space, i.e., GxN  0
7Intuition through Topology Virtualization
- Virtual links minimal path segments whose loss 
 rates uniquely identified
- Can fully describe all paths 
- xG  similar forms as virtual links
1
1
2
1
Rank(G)1
1
Rank(G)2
1
2
2
1
2
3
2
1
1
1
3
2
2
4
3
3
Rank(G)3
4
Virtualization
Real links (solid) and overlay paths (dotted) 
going through them
Virtual links
5 
 8Algorithms
- Select k  rank(G) linearly independent paths to 
 monitor
- Use rank revealing decomposition, e.g., QR with 
 column pivoting
- Leverage sparse matrix time O(rk2) and memory 
 O(k2)
- E.g., 10 minutes for n  350 (r  61075) and k  
 2958
- Compute the loss rates of other paths 
- Time O(k2) and memory O(k2) 
9How much measurement saved ?
- k  O(n2) ? 
- For power-law Internet topology, M nodes, N end 
 hosts
- There are O(M) links and N gt M/2 (with proof) 
- If n  O(N), overlay network has O(n) IP links, k 
 O(n)
10When a Small Portion of End Hosts on Overlay
- Internet has moderate hierarchical structure 
 TGJ02
- If a pure hierarchical structure (tree) k  O(n) 
- If no hierarchy at all (worst case, clique) k  
 O(n2)
- Internet should fall in between 
For reasonably large n, (e.g., 100), k  O(nlogn) 
 11Practical Issues
- Topology measurement errors tolerance 
- Care about path loss rates than any interior 
 links
- Poor router alias resolution present show 
 multiple links for one gt assign similar loss
 rates to all the links
- Unidentifiable routers gt ignore them as 
 virtualization
- Topology changes 
- Add/remove/change one path incurs O(k2) time 
- Topology relatively stable in order of a day gt 
 incremental detection
12Evaluation
- Simulation 
- Topology 
- BRITE Barabasi-Albert, Waxman, hierarchical 1K 
 20K nodes
- Real router topology from Mercator 284K nodes 
- Fraction of end hosts on the overlay 1 - 10 
- Loss rate distribution (90 links are good) 
- Good link 0-1 loss rate bad link 5-10 loss 
 rates
- Good link 0-1 loss rate bad link 1-100 loss 
 rates
- Loss model 
- Bernouli independent drop of packet 
- Gilbert busty drop of packet 
- Path loss rate simulated via transmission of 10K 
 pkts
- Metric path loss rate estimation accuracy 
- Absolute/relative errors 
- Lossy path inference
13Experiments on Planet Lab
- 51 hosts, each from different organizations 
- 51  50  2,550 paths 
- Simultaneous loss rate measurement 
- 300 trials, 300 msec each 
- In each trial, send a 40-byte UDP pkt to every 
 other host
- Simultaneous topology measurement 
- Traceroute 
- Experiments 6/24  6/27 
- 100 experiments in peak hours
14Tomography-based Overlay Monitoring Results
- Loss rate distribution 
- Accuracy 
- On average k  872 out of 2550 
- Absolute error p  p 
- Average 0.0027 for all paths, 0.0058 for lossy 
 paths
- Relative error
15Absolute and Relative Errors
- For each experiment, get its 95 percentile 
 absolute and relative errors for estimation of
 2,550 paths
16Lossy Path Inference Accuracy
- 90 out of 100 runs have coverage over 85 and 
 false positive less than 10
- Many caused by the 5 threshold boundary effects
17Topology Measurement Error Tolerance
- Out of 13 sets of pair-wise traceroute  
- On average 248 out of 2550 paths have no or 
 incomplete routing information
- No router aliases resolved 
- Conclusion robust against topology measurement 
 errors
18Performance Improvement with Overlay 
- With single-node relay 
- Loss rate improvement 
- Among 10,980 lossy paths 
- 5,705 paths (52.0) have loss rate reduced by 
 0.05 or more
- 3,084 paths (28.1) change from lossy to 
 non-lossy
- Throughput improvement 
- Estimated with 
- 60,320 paths (24) with non-zero loss rate, 
 throughput computable
- Among them, 32,939 (54.6) paths have throughput 
 improved, 13,734 (22.8) paths have throughput
 doubled or more
- Implications use overlay path to bypass 
 congestion or failures
19Adaptive Overlay Streaming Media
Stanford
UC San Diego
UC Berkeley
X
HP Labs
- Implemented with Winamp client and SHOUTcast 
 server
- Congestion introduced with a Packet Shaper 
- Skip-free playback server buffering and 
 rewinding
- Total adaptation time lt 4 seconds
20Adaptive Streaming Media Architecture 
 21Conclusions
- A tomography-based overlay network monitoring 
 system
- Given n end hosts, characterize O(n2) paths with 
 a basis set of O(nlogn) paths
- Selectively monitor O(nlogn) paths to compute the 
 loss rates of the basis set, then infer the loss
 rates of all other paths
- Both simulation and real Internet experiments 
 promising
- Built adaptive overlay streaming media system on 
 top of monitoring services
- Bypass congestion/failures for smooth playback 
 within seconds
22Backup Slides