Felix Project Inferential Topology Discovery: From Delay Data to Network Graph PowerPoint PPT Presentation

presentation player overlay
1 / 62
About This Presentation
Transcript and Presenter's Notes

Title: Felix Project Inferential Topology Discovery: From Delay Data to Network Graph


1
Felix Project Inferential Topology
DiscoveryFrom Delay Data to Network Graph
  • Mark W. Garrett
  • 14 February 2001
  • J. Baron, D. Shallcross
  • C. Huitema, J. DesMarais, B. Siegell, P.
    Seymour, F. Chung

Darpa ITOIntrusion Detection Program
An SAIC Company
2
The Felix ProjectGoals
  • Evaluate network status independently fromthe
    usual network management protocolsand data.
  • E.g., no use of routing protocols,
    ping,traceroute, ICMP, SNMP, etc
  • Measure network by sending sparse probe packets
    among a set of monitors. Collect delay and loss
    data.
  • From these data discover the network topology and
    evaluate the performance of all links in the
    network.
  • Small new field of research developing called
    Inferential Topology Discovery (Kurose,
    Towsley, Paxson, McCanne, Caceras, Duffield, et
    al.)
  • This talk presents a particular method based on
    modeling correlation across the observations.

3
Network MonitoringFelix Data Analysis Approach
common component matrix
measurement system
raw data
Identify links
intermediate results
path component matrix
Network element and link performance
Create graph
graph specification(nodes and links)
network graph
network map
Add geographic information
4
Network DiscoveryTerminology for Network
Topology and Monitoring
  • For m monitors, there are np m(m-1) paths
  • The number of links is between m (star) and m2
    (full mesh)
  • Links are unidirectional
  • So a line in the graph usually represents two
    links

5
Network Discovery Reduced Graph Concept
  • Define Reduced Graph as the sub-graph within the
    network that is discoverable.
  • Excludes links not traversed by monitor packets
  • Combines equivalent edges, i.e. edges traversed
    by exactly the same set of paths.
  • Non-series equivalent edges can occur when
    reducing a real graph, but they are very rare.

6
Network Discovery Example of Complete Network
and Reduced Graph
3150 nodes WAN-MAN-LAN design
100 monitors 187 nodes 698 (unidirectional) links
Reduced graph tends to include more of backbone
and less of edges
7
Network Discovery Reduced Graph Non-series
Equivalent Edges
  • Here is an (artificially) symmetrical graph with
    equivalent edges.
  • We have seen non-series equivalent edges only
    once in reducing randomly generated graphs (out
    of 100 examples)

8
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 monitors is a
    successive approximation to the network.

9
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2, 3 monitors is
    a successive approximation to the network.

10
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 4 monitors is
    a successive approximation to the network.

11
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 5 monitors is
    a successive approximation to the network.

12
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 6 monitors is
    a successive approximation to the network.

13
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 7 monitors is
    a successive approximation to the network.

14
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 8 monitors is
    a successive approximation to the network.

15
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 9 monitors is
    a successive approximation to the network.

16
Network Discovery Reduced Graph Related to Paths
  • Reduced graph determined by n 2 10 monitors
    is a successive approximation to the network.
    Etc

17
A Relationship Between Observable Path Metric,
Topology and Link Performance
  • The delay along a path sum of delays for each
    link
  • DP X ? dL
  • X identifies topology (in terms of links on
    paths), and is always rank deficient.
  • To illustrate, consider adding a constant delay
    to each link into a particular node, and
    subtracting from outgoing links.
  • A variation on this general relationship can be
    formulated with each performance metric packet
    loss, link load, throughput, congestion
    probability.

18
Felix Data MeasurementsRouting Changes Apparent
in Data
Data courtesy of Advanced Network Solutions
19
Felix Data MeasurementsRouting Changes Apparent
in Data
Data courtesy of Advanced Network Solutions
20
Felix Data MeasurementsRouting Changes Apparent
in Data
Data courtesy of Advanced Network Solutions
21
Felix Topology DiscoveryCorrelation Method
Concept
22
Felix Correlation Method Identifying Links By
Correlation of Paths
23
Felix Correlation MethodAbstracting Congestion
Event Sequence From Data
  • Open problem how exactly to get from a delay
    measurement on a real network to a series of
    thresholded congestion events.
  • Several approaches
  • Average delay in a fixed-length sliding window
  • Cross-correlation function (pair-wise between
    paths, but promising)
  • Congestion decision can be complex combination of
    delay and loss in window probably most robust
    method, but needs some empirical experience to
    create useful methodology.
  • We assume a solution and solve the next part

24
Felix Correlation MethodNetwork Model Assumptions
  • Node processing delay is negligible, so paths
    sharing nodes(but not links) do not show
    correlation. Queueing delay is associated with
    the link.
  • Network links congest independently.
  • Congestion is modeled asfixed-length
    discrete-time events
  • Congestion rate is fixed for eachlink, but can
    vary over a range forthe set of links in the
    network.
  • Routes are stable
  • Monitor packets are exchangedfrequently enough
    that congestionevents will be recorded
    consistentlyacross all paths crossing a given
    link.
  • Note, this does not require every event to be
    noticed, and real congestion events do occur over
    a wide range of time scales.

25
Felix Correlation MethodObservations and Triggers
  • An Observation is a measurement of congestion
    (however defined) on a path between two monitors.
  • A Trigger is a hypothetical cause of congestion,
    such as a link, or a group of links, in the
    network.
  • Method of solution

Based on joint observations across all paths,
define a model that discriminates statistically
between the true triggers, that represent links
in the network, and the apparent (or false)
triggers that are due to combinations of true
links congesting simultaneously. Then reduce the
triggers down to single links.
26
Felix Correlation MethodObservations and Triggers
Illustration of observations, triggers, paths and
links
Observation a path M1?M3, Observation b
path M2?M4 Trigger a all links on path
a Trigger ab links in common between paths a
and b
  • Definitions and Notation
  • An observation event occurs at time t, when a set
    of paths are congested and not congested as
    specified.
  • For example,is the observation that paths a, b,
    d, k are congested and paths c, g are not
    congested at time t. Paths not included in the
    subscript are dont care for this observation
    variable.

27
Felix Correlation MethodObservations and Triggers
  • A trigger event occurs at time t, when at least
    one link congested that is a member (or not a
    member) of a set of paths as specified.
  • For example,is the event that some link
    congests that is shared by paths a, b, d, k, and
    is not on path c, or path g.
  • We refer to paths in the specification as
    included or excluded
  • If all paths are included or excluded, the
    trigger is fully specified
  • Observation and Trigger Probabilities follow
    these examples

28
Felix Correlation MethodRelationship Between
Observations and Triggers
  • Now we can related the observation and trigger
    probabilities in several interesting ways. E.g.,
    Ratnasamy McCanne
  • This set says, considering only two paths, if we
    see congestion on both paths, then it is caused
    either by a link the two paths share in common,
    or one link on each of the paths (not in common)
    are congesting together.
  • Similarly, if we see congestion on only one path,
    it must be due to a link that is on that path,
    and not on the other.
  • Note, this forces us to explicitly write the
    combinations of triggers that can cause an
    observation (not very scaleable).

29
Felix Correlation MethodRelationship Between
Observations and Triggers
  • Another interesting and useful relationship is
    this
  • This one says that we observe no congestion on a
    set of paths only when none of the triggers that
    are on those paths are active.
  • We say a path (in the trigger specification)
    contradicts the observation when a path turned
    off in the observation is included in the
    trigger. (It is easy to write down these
    combinations.)
  • Inclusion of observations with multiple paths
    makes this model more powerful than an earlier
    method (DP X ? dL) that relied on a
    rank-deficient matrix.

30
Felix Correlation MethodOrganization of Triggers
  • Tree contains all potential triggers, i.e., all
    possible combinations of paths that can specify a
    link or group of links.
  • Triggers on a level partition the set of
    (potential) links in the graph
  • The tree grows exponentially as we add paths, but
    the number of true triggers is bounded by the
    number of links in the network.

31
Felix Correlation MethodSome More Useful Stuff
From the Model
  • Observation of congestion on a path means some
    link on that path is congesting (single-path
    observation and trigger).
  • Something must be happening, so the sum over all
    possible observations with n paths specified
    equals unity.
  • Child triggers are related to their parent.
  • No congestion observed anywhere means all
    triggers are quiet. (The product of all inverse
    triggers on any level is constant.)

32
Felix Correlation MethodSolving for Trigger
Probabilities 3 Path Example
  • Observation of no congestion on 3,2,1 paths
    implies no activity on any trigger that includes
    one of the named paths
  • Triangular form each equation produces one Pvt

33
Felix Correlation MethodGeneralization of
Solution to Any Number of Paths
  • Count various things
  • n number of paths in the triggers level in
    tree diagram
  • k number of paths in the observation (varying
    from n down to 1)
  • j number of paths excluded in the triggers
    (varying from 0 to n-1)
  • Divide Master equation by each Specific
    equation to find one trigger probability

34
Felix Correlation MethodGeneralization of
Solution to Any Number of Paths
  • For n paths there are 2n-1 equations and 2n-1
    triggers.
  • The Master equation has all possible triggers,
    i.e., any active trigger contradicts the
    observation of no congestion anywhere.
  • For class 1 triggers (0 j lt k)
  • The j paths excluded in the trigger cannot cover
    all k paths in the observation, so at least one
    path is included in the trigger that contradicts
    the observation.
  • All triggers then occur in both the master and
    specific equations, and cancel out in the
    division.
  • For class 2 triggers (j k)
  • The j paths excluded in the trigger can cover the
    k paths in the observation, but there is only one
    combination. Call this the target trigger. All
    other triggers contradict the observation and
    cancel out.
  • There is one equation in which each such target
    trigger survives the division.

35
Felix Correlation MethodGeneralization of
Solution to Any Number of Paths
  • For class 3 triggers (k lt j n-1)
  • There are such triggers.
  • No class 3 triggers exist in the first two
    stages(k n, and k n1)
  • All class 3 triggers are computed at previous
    stages, when they appear as class 2 triggers.
  • For example, consider the case k 8 lt j 9. In
    the previous stage when we had k 9, the class 2
    triggers with j 9 were solved.
  • Each Quotient equation is left with one unknown
    trigger

36
Felix Correlation MethodGeneralization of
Solution to Any Number of Paths
  • General form of solution, for trigger
    probabilities with paths excluded (first case),
    and with no paths excluded (second case)
  • Where
  • E is the set of excluded paths in the trigger
  • I is the set of included paths in the trigger
  • N is the set of all paths
  • w is the set of class-3 trigger probabilities in
    the master equation, but not in the specific
    equation
  • u is the set of all trigger probabilities with at
    least one path excluded.

37
Felix Correlation MethodPruning Tree Reduces
Computational Complexity
  • Returning to the tree of trigger probabilities
  • For triggers that specify actual links in the
    network, the trigger probability is the
    (aggregate) congestion rate on that set of links.
  • False triggers (for which no link exists) are
    approximately zero
  • (True) triggers on the last level identify single
    links and their associated paths (reduced graph).
  • Therefore, a trigger prob. of zero can be pruned
    out along with all of its descendents.
  • Number of triggers to compute is bounded by
    (paths links).

Lets see some results
38
Felix Correlation MethodResults
18 monitors 23 nodes 95 (unidirectional) links
39
Felix Correlation MethodResults
19 monitors 27 nodes 114 (unidirectional) links
40
Felix Correlation MethodResults
20 monitors 29 nodes 121 (unidirectional) links
41
Felix Correlation MethodResults
50 monitors 61 nodes 269 (unidirectional) links
  • Run with link congestion rate of 1 (best
    efficiency)
  • Approx 12 hours to compute

42
Felix Correlation MethodAlgorithm Complexity
  • Complexity of correlation algorithm is more than
    (paths links) because the computation of
    triggers increases with number of paths
  • but it is polynomial O(LPN L2P) for L links,
    P paths, N simulated time intervals.
  • However, the overall run-time is apparently
    exponential, because it takes more data to
    discriminate the true and false triggers as the
    network gets larger.

43
Felix Correlation MethodAlgorithm Complexity
  • Running time of simulation and correlation code
    as function of network size (number of links)
  • Exponential increase if quality of result held
    constant.
  • Link Congestion Rate 10 (constant).

44
Felix Correlation MethodResults With Variable
Link Congestion
  • Constant link congestion rate is artificial
    constraint
  • Algorithm works well with links congesting in a
    range,e.g., tried 1 5, 1 10, 1 15,
    etc.
  • Effect is to spread the distribution of true
    trigger probabilities
  • Longer convergence time
  • Probably all of the simplifying assumptions in
    the model can be relaxed at the cost of increased
    convergence time.
  • Correlation algorithm ran fastest with 1 link
    congestion
  • Probably an artifact of implementation

45
Felix Correlation MethodStatistical
Discrimination Problem
  • Nice scaling property of the algorithm depends on
    being able to discriminate true from false
    triggers.
  • False triggers are approximately zero, but at
    edge of solvable parameter space, both
    populations are more noisy
  • Too little data (from simulation or measurement)
  • Too much variability in link loss rates
  • Too much dependence between link congestions,
    etc, etc
  • Need to set threshold, group triggers and
    evaluate goodness of resulting topology.

46
Felix ProjectGeneral Discussion
  • We can make use of multicast idea (MINC project)
    to reduce load on network each source
    multicasts packets to all receivers.
  • This will improve coincidence of measurements in
    time across all paths.

47
Felix Topology / Performance InferenceApplicabili
ty
  • Does not replace traditional autodiscovery
    methods (SNMP)
  • May augment autodiscovery in difficult
    environment
  • Military network under physical attack
  • Military or commercial network under cyber-attack
  • Network with buggy software (e.g. routing
    implementation)
  • Multiple protocol layers, not all included in
    autodiscovery
  • Protocols too old or new for the autodiscovery
    technology
  • Good for observing networks not under your
    control
  • Commercial context ISP tries to locate fault
    between networks
  • Military context Map out foreign network
  • Future networks will probably be more chaotic
  • Track changing topology performance with
    minimal extra load

48
Felix ProjectFurther Work
  • Augment algorithms to work in more fully
    realistic environment
  • Non-discrete time congestion events with ragged
    edges
  • Less stable routing (this is hard)
  • Dependence in link congestion cross traffic
    routed through net
  • More volatile delay and loss patterns (most
    significant issue)
  • Wider range of congestion rates more erratic
    time dependence
  • Variation with delay metric (instead of
    probability of congestion) is possible.
  • Result would be bounds on mean, variance, (higher
    moments) of delay distribution on each link.
  • Procedure is analogous (but not identical) to
    present algorithm.
  • Progressive version of algorithm to update
    existing topology estimate based on continuous
    data.
  • More experience with real data

49
Felix Correlation MethodSummary Three Stages in
Topology Discovery
Future Work
  • Reduced graph concept limitation of
    observability
  • Decomposition of topology/performance inference
    into separable problems
  • Allows optimization and variation of algorithms
    at each stage
  • Correlation Method
  • Uses entire time series of data for each path.
  • Takes advantage of joint statistics across all
    paths

50
Felix Project
  • Extra Slides

51
Topology Discovery and Performance Assessment 6
Methods
  • Matrix method
  • Evaluates goodness of topology, solves for link
    delay or loss
  • Tree-growing method
  • Composes topology as a tree, solves for link
    delays, goodness of fit.
  • Spike-tail method
  • Uses delay distributions to solve for link loads
    given topology.
  • Correlation method
  • Uses time-dependent delay data to find common
    path components.
  • Matroid method
  • Graph theoretic method - complements correlation
    method by solving from path-component list to
    topology
  • Distance-Realization method
  • Graph theoretic method - finds topologies rooted
    at each monitor and merges for complete system
    topology

52
Time Series Example A? G
53
Time Series Example G? A
54
Heavy-tailed Distribution of Packet Delay
55
Clock Drift Correction
  • Algorithm
  • Compute lower envelope of time series in both
    directions.
  • Shift lower envelopes so centered around zero.
  • Compute average of envelopes (one flipped).
  • Add/subtract average from original time series
    data.

56
Clock Drift Problem in One-way Delay Measurements
57
Time series data - adjusted delay from buzzard to
brooklyn
58
Time series data - adjusted delay from brooklyn
to buzzard
59
Felix Matroid MethodSummary
  • Partial solution - goes with Correlation Method
  • Input here is unordered path-component list
  • 3 stages with increasing level of assumptions
  • Clouds Incomplete solution is still useful when
    uncertainty is geographically localized.
    Internet graphs usually have no clouds.
  • Split nodes in solution - we can surely fix this
    problem.
  • Monitor placement changes discovered graph --
    also changes discoverable reduced graph
  • Two examples - used GeorgiaTech code to generate
    realistic-looking Internet topologies

60
Felix Matroid MethodExample of Reconstructed
Network Graph
3150 nodes WAN-MAN-LAN design
61
Felix Matroid MethodExample of Reconstructed
Network Graph
100 monitors 187 nodes 698 (unidirectional) links
62
Felix Matroid MethodExample of Reconstructed
Network Graph
74 split nodes 2 clouds with 3 links each
Write a Comment
User Comments (0)
About PowerShow.com