An Algebraic Approach to Practical and Scalable Overlay Network Monitoring PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: An Algebraic Approach to Practical and Scalable Overlay Network Monitoring


1
An Algebraic Approach to Practical and Scalable
Overlay Network Monitoring
David Bindel, Hanhee Song, and Randy H. Katz
Yan Chen
  • University of California at Berkeley

Northwestern University
ACM SIGCOMM 2004
2
Motivation
  • Infrastructure ossification led to thrust of
    overlay and P2P applications
  • Such applications flexible on paths and targets,
    thus can benefit from E2E distance monitoring
  • Overlay routing/location
  • VPN management/provisioning
  • Service redirection/placement
  • Requirements for E2E monitoring system
  • Scalable efficient small amount of probing
    traffic
  • Accurate capture congestion/failures
  • Adaptive nodes join/leave, topology changes
  • Robust tolerate measurement errors
  • Balanced measurement load

3
Related Work
  • General metrics RON (n2 measurement)
  • Latency estimation
  • Link-level-measurement min set cover (Ozmultu et
    al), similar approach for giving bounds of other
    metrics (Tang McKinley)
  • Clustering-based IDMaps, Internet Isobar, etc.
  • Coordinate-based GNP, Virtual Landmarks,
    Vivaldi, etc.
  • Network tomography
  • Focusing on inferring the characteristics of
    physical links rather than E2E paths
  • Limited measurements -gt under-constrained system,
    unidentifiable links

4
Problem Formulation
  • Given an overlay of n end hosts and O(n2) paths,
    how to select a minimal subset of paths to
    monitor so that the loss rates/latency of all
    other paths can be inferred.
  • Assumptions
  • Topology measurable
  • Can only measure the E2E path, not the link

5
Outlines
  • An algebraic approach framework
  • Algorithms for a fixed set of overlay nodes
  • Scalability analysis
  • Adaptive dynamic algorithms
  • Measurement load balancing
  • Handling topology measurement errors
  • Simulations and Internet experiments

6
Our Approach
  • Select a basis set of k paths that fully describe
    O(n2) paths (k O(n2))
  • Monitor the loss rates of k paths, and infer the
    loss rates of all other paths
  • Applicable for any additive metrics, like latency

7
Modeling of Path Space
A
1
3
D
C
2
B
  • Path loss rate p, link loss rate l

8
Putting All Paths Together
Totally r O(n2) paths, s links, s ltltr


9
Sample Path Matrix
  • x1 - x2 unknown gt cannot compute x1, x2
  • To separate identifiable vs. unidentifiable
    components x xG xN
  • All E2E paths (G) are orthogonal to xN, i.e., GxN
    0

10
Intuition through Topology Virtualization
Virtual links minimal path segments whose loss
rates uniquely identified
  • Can fully describe all paths
  • xG composed of virtual links

Virtualization
Real links (solid) and all of the overlay paths
(dotted) traversing them
Virtual links
11
Algorithms
  • Select k rank(G) linearly independent paths to
    monitor (one time)
  • Use QR decomposition
  • Leverage sparse matrix time O(rk2) and memory
    O(k2)
  • E.g., 79 seconds for n 300 (r 44850) and k
    2541
  • Compute the loss rates of other paths
    (continuously)
  • Time O(k2) and memory O(k2)




12
Outlines
  • An algebraic approach framework
  • Algorithms for fixed set of overlay nodes
  • Scalability analysis
  • Adaptive dynamic algorithms
  • Measurement load balancing
  • Handling topology measurement errors
  • simulations and Internet experiments

13
How many measurements saved ?
  • k O(n2) ?
  • For a power-law Internet topology
  • When the majority of end hosts are on the overlay
  • When a small portion of end hosts are on overlay
  • If Internet a pure hierarchical structure (tree)
    k O(n)
  • If Internet no hierarchy at all (worst case,
    clique) k O(n2)
  • Internet has moderate hierarchical structure
    TGJ02

k O(n) (with proof)
For reasonably large n, (e.g., 100), k O(nlogn)
14
Linear Regression Tests of the Hypothesis
  • BRITE Router-level Topologies
  • Barbarasi-Albert, Waxman, Hierarchical models
  • Mercator Real Topology
  • Most have the best fit with O(n) except the
    hierarchical ones fit best with O(nlogn)

15
Outlines
  • An algebraic approach framework
  • Algorithms for fixed set of overlay nodes
  • Scalability analysis
  • Adaptive dynamic algorithms
  • Measurement load balancing
  • Handling topology measurement errors
  • Simulations and Internet experiments

16
Topology Changes
  • Basic building block add/remove one path
  • Incremental changes O(k2) time (O(n2k2) for
    re-scan)
  • Add path check linear dependency with old basis
    set,
  • Delete path p hard when
  • Intuitively, two steps
  • Add/remove end hosts , Routing changes
  • Routing relatively stable in order of a day
  • gt incremental detection

17
Topology Change Example
18
Other Practical Issues
  • Measurement load balancing
  • Randomly reorder the paths in G before scanning
    them for selection of
  • Has no effect on the loss rate estimation
    accuracy
  • Topology measurement errors tolerance
  • Care about path loss rates than any interior
    links
  • Router aliases
  • gt Let it be assign similar loss rates to the
    same links
  • Path (segments) without topology info
  • gt add virtual links to bypass

19
Outlines
  • An algebraic approach framework
  • Algorithms for fixed set of overlay nodes
  • Scalability analysis
  • Adaptive dynamic algorithms
  • Measurement load balancing
  • Handling topology measurement errors
  • Simulations and Internet experiments

20
Evaluation
Areas and Domains Areas and Domains Areas and Domains of hosts
US (40) .edu .edu 33
US (40) .org .org 3
US (40) .net .net 2
US (40) .gov .gov 1
US (40) .us .us 1
Interna-tional (11) Europe (6) France 1
Interna-tional (11) Europe (6) Sweden 1
Interna-tional (11) Europe (6) Denmark 1
Interna-tional (11) Europe (6) Germany 1
Interna-tional (11) Europe (6) UK 2
Interna-tional (11) Asia (2) Taiwan 1
Interna-tional (11) Asia (2) Hong Kong 1
Interna-tional (11) Canada Canada 2
Interna-tional (11) Australia Australia 1
  • Extensive Simulations
  • See paper
  • Experiments on PlanetLab
  • 51 hosts, each from different organizations
  • 51 50 2,550 paths
  • Simultaneous loss rate measurement
  • 300 trials, 300 msec each
  • In each trial, send a 40-byte UDP pkt to every
    other host
  • Topology measurement (traceroute)
  • 100 experiments in peak hours of North America

21
PlanetLab Experiment Results
  • Loss rate distribution
  • On average k 872 out of 2550
  • Metrics
  • Absolute error p p
  • Average 0.0027 for all paths, 0.0058 for lossy
    paths
  • Relative error BDPT02
  • Average 1.1 for all paths, and 1.7 for lossy paths

loss rate 0, 0.05) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1)
loss rate 0, 0.05) 0.05, 0.1) 0.1, 0.3) 0.3, 0.5) 0.5, 1.0) 1.0
95.9 15.2 31.0 23.9 4.3 25.6
22
More Experiment Results
  • Running time
  • Setup (path selection) 0.75 seconds
  • Update (for all 2550 paths) 0.16 seconds
  • More results on topology change adaptation see
    paper
  • Robustness
  • Out of 14 sets of pair-wise traceroute
  • On average 245 out of 2550 paths have no or
    incomplete routing information
  • No router aliases resolved
  • Conclusion robust against topology measurement
    errors

23
Results for Measurement Load Balancing
  • Simulation on an overlay of 300 end hosts,
    average load 8.5
  • With balancing Gaussian-like load distribution
  • Without heavily skewed, with the max almost 20
    times the average

24
Conclusions
  • A tomography-based overlay network monitoring
    system
  • Given n end hosts, characterize O(n2) paths with
    a basis set of O(nlogn) paths
  • Selectively monitor the basis set for their loss
    rates, then infer the loss rates of all other
    paths
  • Adaptive to topology changes
  • Balanced measurement load
  • Topology measurement error tolerance
  • Both simulation and PlanetLab experiments show
    promising results
  • Built an adaptive overlay streaming media system
    on top of it

25
Backup Slides
26
Other Practical Issues
  • Topology measurement errors tolerance
  • Care about path loss rates than any interior
    links
  • Poor router alias resolution
  • gt assign similar loss rates to the same links
  • Unidentifiable routers
  • gt add virtual links to bypass
  • Measurement load balancing on end hosts
  • Randomly order the paths for scan and selection
    of

27
Modeling of Path Space
A
1
3
D
C
2
B
  • Path loss rate p, link loss rate l

Put all r O(n2) paths together Totally s links
28
Sample Path Matrix
  • x1 - x2 unknown gt cannot compute x1, x2
  • Set of vectors
  • form null space
  • To separate identifiable vs. unidentifiable
    components x xG xN
  • All E2E paths are in path space, i.e., GxN 0

29
Intuition through Topology Virtualization
  • Virtual links
  • Minimal path segments whose loss rates uniquely
    identified
  • Can fully describe all paths
  • xG is composed of virtual links

All E2E paths are in path space, i.e., GxN 0
30
Algorithms

  • Select k rank(G) linearly independent paths to
    monitor
  • Use rank revealing decomposition
  • Leverage sparse matrix time O(rk2) and memory
    O(k2)
  • E.g., 10 minutes for n 350 (r 61075) and k
    2958
  • Compute the loss rates of other paths
  • Time O(k2) and memory O(k2)

31
Practical Issues
  • Topology measurement errors tolerance
  • Care about path loss rates than any interior
    links
  • Poor router alias resolution
  • gt assign similar loss rates to the same links
  • Unidentifiable routers
  • gt add virtual links to bypass
  • Measurement load balancing on end hosts
  • Randomly order the paths for scan and selection
    of
  • Topology Changes
  • Efficient algorithms for incrementally update of
  • for adding/removing end hosts routing changes

32
More Experiment Results
  • Measurement load balancing
  • Putting load values of each node in 10 equally
    spaced bins
  • Running time
  • Setup (path selection) 0.75 seconds
  • Update (for all 2550 paths) 0.16 seconds
  • More results on topology change adaptation see
    paper

With load balancing
Without load balancing
33
Work in Progress
  • Provide it as a continuous service on PlanetLab
  • Network diagnostics
  • Which links or path segments are down
  • Iterative methods for better speed and scalability

34
Evaluation
  • Simulation
  • Topology
  • BRITE Barabasi-Albert, Waxman, hierarchical 1K
    20K nodes
  • Real topology from Mercator 284K nodes
  • Fraction of end hosts on the overlay 1 - 10
  • Loss rate distribution (90 links are good)
  • Good link 0-1 loss rate bad link 5-10 loss
    rates
  • Good link 0-1 loss rate bad link 1-100 loss
    rates
  • Loss model
  • Bernouli independent drop of packet
  • Gilbert busty drop of packet
  • Path loss rate simulated via transmission of 10K
    pkts
  • Experiments on PlanetLab

35
Evaluation
  • Extensive Simulations
  • Experiments on PlanetLab
  • 51 hosts, each from different organizations
  • 51 50 2,550 paths
  • On average k 872
  • Results Highlight
  • Avg real loss rate 0.023
  • Absolute error mean 0.0027 90 lt 0.014
  • Relative error mean 1.1 90 lt 2.0
  • On average 248 out of 2550 paths have no or
    incomplete routing information
  • No router aliases resolved

Areas and Domains Areas and Domains Areas and Domains of hosts
US (40) .edu .edu 33
US (40) .org .org 3
US (40) .net .net 2
US (40) .gov .gov 1
US (40) .us .us 1
Interna-tional (11) Europe (6) France 1
Interna-tional (11) Europe (6) Sweden 1
Interna-tional (11) Europe (6) Denmark 1
Interna-tional (11) Europe (6) Germany 1
Interna-tional (11) Europe (6) UK 2
Interna-tional (11) Asia (2) Taiwan 1
Interna-tional (11) Asia (2) Hong Kong 1
Interna-tional (11) Canada Canada 2
Interna-tional (11) Australia Australia 1
36
Sensitivity Test of Sending Frequency
  • Big jump for of lossy paths when the sending
    rate is over 12.8 Mbps

37
PlanetLab Experiment Results
  • Loss rate distribution
  • Metrics
  • Absolute error p p
  • Average 0.0027 for all paths, 0.0058 for lossy
    paths
  • Relative error BDPT02
  • Lossy path inference coverage and false positive
    ratio
  • On average k 872 out of 2550

loss rate 0, 0.05) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1)
loss rate 0, 0.05) 0.05, 0.1) 0.1, 0.3) 0.3, 0.5) 0.5, 1.0) 1.0
95.9 15.2 31.0 23.9 4.3 25.6
38
Accuracy Results for One Experiment
  • 95 of absolute error lt 0.0014
  • 95 of relative error lt 2.1

39
Accuracy Results for All Experiments
  • For each experiment, get its 95 absolute
    relative errors
  • Most have absolute error lt 0.0135 and relative
    error lt 2.0

40
Lossy Path Inference Accuracy
  • 90 out of 100 runs have coverage over 85 and
    false positive less than 10
  • Many caused by the 5 threshold boundary effects

41
Performance Improvement with Overlay
  • With single-node relay
  • Loss rate improvement
  • Among 10,980 lossy paths
  • 5,705 paths (52.0) have loss rate reduced by
    0.05 or more
  • 3,084 paths (28.1) change from lossy to
    non-lossy
  • Throughput improvement
  • Estimated with
  • 60,320 paths (24) with non-zero loss rate,
    throughput computable
  • Among them, 32,939 (54.6) paths have throughput
    improved, 13,734 (22.8) paths have throughput
    doubled or more
  • Implications use overlay path to bypass
    congestion or failures

42
Adaptive Overlay Streaming Media
Stanford
UC San Diego
UC Berkeley
X
HP Labs
  • Implemented with Winamp client and SHOUTcast
    server
  • Congestion introduced with a Packet Shaper
  • Skip-free playback server buffering and
    rewinding
  • Total adaptation time lt 4 seconds

43
Adaptive Streaming Media Architecture
44
Conclusions
  • A tomography-based overlay network monitoring
    system
  • Given n end hosts, characterize O(n2) paths with
    a basis set of O(nlogn) paths
  • Selectively monitor O(nlogn) paths to compute the
    loss rates of the basis set, then infer the loss
    rates of all other paths
  • Both simulation and real Internet experiments
    promising
  • Built adaptive overlay streaming media system on
    top of monitoring services
  • Bypass congestion/failures for smooth playback
    within seconds
Write a Comment
User Comments (0)
About PowerShow.com