Demystifying the router-level topology - PowerPoint PPT Presentation

About This Presentation
Title:

Demystifying the router-level topology

Description:

Measurement tools are primitive. Technical challenges are significant. ... Continue to conduct measurements, build more informed models, and validate them. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 32
Provided by: lakhinabye
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Demystifying the router-level topology


1
Demystifying the router-level topology
John Byers Department of Computer Science and
Topology Modeling Group,
Boston University CS Mark Crovella, Anukool
Lakhina, Ibrahim Matta Physics Paul
Krapivsky, Sid Redner
Statistics David Chiu, Eric Kolaczyk
2
Mystery and mystification
  • The Internet topology is shrouded in mystery.
  • Rapid, decentralized growth
  • Emergent behavior
  • Can treat it like a found object
  • Need to approach the Internet scientifically
  • Complexity and massive scale
  • Give me a break!

3
Why are we still mystified?
  • Many levels of abstraction
  • AS-level topology vs.
  • Topology seen by IP or traceroute vs.
  • Physical topology switching elements
  • Obfuscation (intentional or otherwise) by ISPs.
  • Measurement tools are primitive.
  • Technical challenges are significant.
  • Refutation of theories not part of our culture
    (?)

4
Who is mystified?
  • I am mystified.
  • Most networking researchers are mystified, or are
    under the illusion that they are not mystified.
  • People in other communities are definitely
    mystified.
  • Network operators are presumably not mystified.

5
The Demystification Manifesto(apologies to
Varghese-Estan for blatantly ripping off part of
their HotNets-II paper title)
  • Clear up technical confusion.
  • Articulate strengths and weaknesses of tools
  • Refute broken theories
  • Continue to conduct measurements, build more
    informed models, and validate them.
  • Declare success when
  • No more tall tales about scale-free graphs in the
    router-level topology.
  • Or ?

6
Outline
  • Demystification Manifesto
  • Case Study Demystifying traceroute. Lakhina,
    Byers, Crovella, Xie Infocom 03
  • Next steps on demystification agenda

7
Internet mapping efforts
  • Goal Discover the Internet router-level
    topology
  • Vertices represent routers.
  • Edges connect routers that are one IP hop apart.

8
Fundamental limitations of traceroute
  • The IP path is not the router-level path.
  • Many-to-many relationships
  • One router may have many interfaces
  • A collection of switching gear may appear to be a
    single IP address
  • MPLS label switching, ATM, GigaPoPs
  • Missing data and noisy data is the norm.

9
Most recent traceroute studies
  • k sources Few active sources, strategically
    located.
  • m destinations Many passive destinations,
    globally dispersed.
  • Union of many traceroute paths.
  • (k,m)-traceroute study

Sources
Destinations
10
Heavy tails in topology measurements
A surprising finding FFF99 Let be a
given node degree. Let be frequency of
degree vertices in a graph Power-law
relationship
11
Hmmm..
  • We will argue that the evidence for power laws is
    at best insufficient.
  • Insufficient does not mean noisy or incomplete.
    (which these datasets certainly are!)
  • For us, insufficient means that measurements are
    statistically biased.
  • We will show that (k,m)-traceroute studies
    likely exhibit significant sampling bias.

12
A thought experiment
  • Idea Simulate topology measurements on a random
    graph.
  • Generate a sparse Erdös-Rényi random graph,
    G(V,E). Each edge present independently with
    probability p Assign weights w(e) 1 e ,
    where e in
  • Pick k unique source nodes, uniformly at random
  • Pick m unique destination nodes, uniformly at
    random
  • Simulate traceroute from k sources to m
    destinations, i.e. learn shortest paths between k
    sources and m destinations.
  • Let G be union of shortest paths.
  • Ask How does G compare with G ?

13
Underlying Random Graph, G
log(PrXgtx)
MeasuredGraph, G
Underlying Graph N100000, p0.00015Measured
Graph k3, m1000
log(Degree)
G is a biased sample of G that looks
heavy-tailedAre heavy tails a measurement
artifact?
14
Understanding Bias
An intuitive explanation When traces are
run from few sources to lots of destinations,
some portions of underlying graph are explored
more than others. We now investigate the causes
behind bias.
15
Are nodes sampled unevenly?
  • Conjecture Shortest path routing favors higher
    degree nodes ? nodes sampled unevenly
  • ValidationExamine true degrees of nodes in
    measured graph, G. Expect true degrees of nodes
    in G to be higher than degrees of nodes in G, on
    average.

16
Are edges sampled unevenly?
  • ConjectureEdges selected incident to a node in
    G not proportional to true degree.
  • ValidationFor each node in G, plot true degree
    vs. measured degree. If unbiased, ratio of true
    to measured degree should be constant. Points
    clustered around ycx line (clt1).

17
Why Analyzing Bias
  • Question Given some vertex in G that is h hops
    from the source, what fraction of its true edges
    are contained in G?
  • Messages
  • As h increases, number of edges discovered falls
    off sharply.

1000dst
Fraction of node edges discovered
600dst
100dst
Distance from source
We can prove exponential fall-off analytically,
in a simplified model.
18
What does this suggest?
SummaryEdges are sampled unevenly by
(k,m)-traceroute methods.Edges close to the
source are sampled more often than edges further
away.
Intuitive Picture Neighborhood near sources is
well explored, but visibility of edges declines
sharply with hop distance from sources.
19
Inferring Bias
  • Goal
  • Given a measured G, does it appear to be biased?
  • Why this is difficult
  • Dont have underlying graph.
  • Dont have formal criteria for checking bias.
  • General Approach Examine statistical properties
    as a function of distance from nearest source.
  • Unbiased sample ? No change
  • Change ? Bias

20
Detecting Bias
Examine PrDdHh, the conditional probability
that a node has degree d, given that it is at
distance h from the source.
Underlying Graph
log(PrXgtx)
G degrees H3
G degrees H2
log(Degree)
Two observations1. Highest degree nodes are
near the source.2. Degree distribution of nodes
near the source different from those far away
21
A Statistical Test for C1
C1 Are the highest-degree nodes near the
source? If so, then consistent with bias.
The 1 highest degree nodes occur at random with
distance to nearest source.
H0C1
  • Cut vertex set in half N (near) and F (far), by
    distance from nearest source.
  • Let v (0.01) V
  • k fraction of v that lies in N
  • Can bound likelihood k deviates from 1/2 using
    Chernoff bounds

Reject hypothesis with confidence 1-a if
22
A Statistical Test for C2
C2 Is the degree distribution of nodes near the
source different from those further away? If
so, consistent with bias.
Chi Square Test succeeds on degree distribution
for nodes near the source and far from the
source.
H0C2
Partition vertices across median distance N
(near) and F (far) Compare degree distribution
of nodes in N and F, using the Chi-Square Test

where O and E are observed and expected degree
frequencies and l is histogram bin size. Reject
hypothesis with confidence 1-a if
23
Our Definition of Bias
  • Bias (Definition) Failure of a sampled graph
    to meet statistical tests for randomness
    associated with C1 and C2.
  • Disclaimers Tests are not conclusive. Tests
    are binary and dont tell us how biased datasets
    are.
  • But dataset that fails both tests is a poor
    choice to make generalizations of underlying
    graph.

24
Introducing datasets
Dataset Name Date Nodes Links Srcs Dsts
Pansiot-Grad 1995 3,888 4,857 12 1270
Mercator 1999 228,263 320,149 1 NA
Skitter 2000 7,202 11,575 8 1277
Pansiot-Grad
Mercator
Skitter
log(PrXgtx)
log(Degree)
25
Testing C1
H0C1
The 1 highest degree nodes occur at random with
distance to source.
Pansiot-Grad 93 of the highest degree nodes are
in N Mercator 90 of the highest degree nodes
are in N Skitter 84 of the highest degree
nodes are in N
26
Testing C2
H0C2
27
Some possible explanations
  • Degree distribution is uniform, but sampling is
    biased.
  • Degree distribution is non-uniform, and nodes
    further from the source really do have
    below-average degree.
  • Others?

28
Final Remarks on traceroute
  • Using (k,m)-traceroute methods for mapping is a
    bias-prone method.
  • Rocketfuel SMW02 or similar methods may avoid
    some pitfalls of (k,m)-traceroute studies.
  • Can we remove bias in a statistically sound way?
  • An open question Can we sample the degree of a
    router at random?

29
Outline
  • Demystification Manifesto
  • Case Study Demystifying traceroute.
  • Next steps on demystification agenda

30
Demystification Agenda
0. Adopt a no hype, no-nonsense mindset. 1.
Clear up technical misunderstandings and pitfalls
associated with router-level measurements.
Revisit incorrect conclusions, flawed methods,
and broken theories. 2. Attempt to educate or
re-educate as broad a community as possible.
Arguably a focus for tonights discussion (?)
31
Demystification Agenda
3. Understanding the router-level topology alone
is insufficient.
Much more insight from studying the annotated
graph.
From my IPAM 02 talk
Write a Comment
User Comments (0)
About PowerShow.com