Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring

Description:

2006 Hewlett-Packard Development Company, L.P. ... GNP, Vivaldi, and other co-ordinate based systems. Correlation in latencies across paths ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 23
Provided by: tanjeet
Category:

less

Transcript and Presenter's Notes

Title: Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring


1
Correlations in E2E Network Metrics Impact on
Large Scale Network Monitoring
Praveen Yalagandula Sung-Ju Lee Puneet
Sharma Sujata Banerjee
  • HP Labs, Palo Alto
  • http//networking.hpl.hp.com

2
Motivation
  • Large scale E2E network monitoring
  • Application management, Flow control, Fault
    Diagnosis, etc.
  • A key question What granularity should we
    measure?
  • Coarse-grained lower cost but
    higher inaccuracy
  • Fine-grained lower inaccuracy but
    higher cost
  • Observation Heterogeneity in measurement costs
  • PING lt TRACEROUTE lt PATHRATE
  • Our investigation
  • Are different E2E network metrics correlated?
  • Can we leverage such dependencies (if any) to
  • Lower monitoring cost while maintaining high
    accuracies?

3
Our Approach
  • We consider two correlations in the current work
  • Changes in Hop and Latency ? Changes in Route
  • Changes in Route ? Changes in Capacity
  • We use data from S3 deployment on Planet-Lab
  • 2years of data
  • E2E measurements Traceroute and Pathrate
    (capacity)
  • On thousands of paths
  • Perform Cost vs. Accuracy analysis for two cases
  • Base Only higher cost measurements are performed
  • Strategy
  • Perform lower cost measurements
  • If change detected, perform higher cost
    measurements

4
State-of-the-art
  • Correlations assumed by previous systems
  • GNP, Vivaldi, and other co-ordinate based systems
  • Correlation in latencies across paths
  • NetQuest
  • Correlation between hop changes and route changes
  • CoDeen
  • Correlation between route changes and capacity
  • Our work
  • Quantify the correlation
  • Perform accuracy vs cost tradeoff analysis

5
Outline
  • Motivation Quantify leverage metric
    correlations
  • S3 Scalable Sensing Service
  • Deployment on PlanetLab
  • Correlations
  • Changes in Hop and Latency ? Changes in Route
  • Changes in Route ? Changes in Capacity
  • Cost-Accuracy Tradeoff Analysis
  • Summary and Future work

6
S3 Architecture
  • Sensor pods
  • Collection of sensors
  • Measure system state from a nodes view
  • Backplane
  • Programmable fabric
  • Connects pods and aggregates measured system
    state
  • Inference Engines
  • Infers O(n2) E2E paths info by measuring few
    paths
  • Schedules measurements on pods
  • Aggregates data on backplane
  • Applications

7
Sensor Pod
Configuration Data
SNMP Agent
Repository
Load
Memory
Secure Web Interface
Capacity
API query, control, and notification
Lossrate
Controller
Bandwidth
Latency
8
S3 Deployment on Planet-Lab
  • Running since January 2006
  • All pair network metrics
  • Latency Inferred by Netvigator
  • Lossrate Measured using Tulip lossrate tool
  • Available Bandwidth Measured using Spruce and
    PathChirp
  • Capacity Measured using Pathrate
  • Stats14GB raw data every day, 1GB compressed

9
Two correlations quantified
  • Changes in hop and latency ? changes in route
    (HL?R)?
  • PING can be used to measure both hops and latency
  • Original TTL - Remaining TTL value Num of hops
  • Change in number of hops will always means change
    in the route
  • But does change in the route ? change in the
    number of hops?
  • Obviously NO but how often how it affects
    monitoring accuracy?
  • Changes in route ? changes in capacity (R?C)?
  • Capacity can change when route is not changed
  • CAP Limits
  • Especially in PlanetLab
  • Becoming common in other networks e.g., Cable
    networks
  • Same route, but link upgraded or link-level
    change not visible in IP route
  • Question
  • How often does this happen and how it affects
    monitoring accuracy?

10
S3 Dataset
  • HL ? R
  • Use Traceroute measurements
  • Performed at each node to 20 landmark nodes
  • Landmark nodes (20) chosen across the globe
  • Performed once every 30 minutes
  • R ? C
  • Use Traceroute and Pathrate measurements
  • Each node performs Pathrate to all other nodes
  • In a round-robin fashion
  • Takes about a day (avg.) to complete a round of
    measurements
  • We use Pathrate measurements iff (0 lt COV lt 1)

11
Defining metric changes
  • Route changes (R)
  • R1 If current route does not match previous
    sample
  • Else R0
  • Some times routers do not respond in output
  • We ignore those hops during above route change
    detection
  • Latency changes (L)
  • L1 If current latency is p or more different
    than the previous sample
  • Else L0
  • We use p5 for this analysis
  • Hop changes (H)
  • H1 If current number of hops does not match
    with the previous
  • H0 otherwise

12
Case counts
  • Averaged across all paths
  • H Change in hops L Change in Latency R
    Change in route

13
Case counts
Measurements where route changed but neither hops
nor latency changed ? If we use changes in hops
and/or latency to detect route changes, we will
miss these
  • Averaged across all paths
  • H Change in hops L Change in Latency R
    Change in route

14
Case counts
Overall, these two numbers are small ? changes
in hop and latency can be a good indicator of
changes in route
  • Averaged across all paths
  • H Change in hops L Change in Latency R
    Change in route

15
Cost-Accuracy Tradeoff
  • What if we perform only PING and then perform
    Traceroute only when a hop or latency change is
    observed?
  • Reduces cost PING is relatively inexpensive
  • Increases inaccuracy Might miss some some route
    changes
  • Base method Traceroutes every T seconds
  • Strategy
  • Perform Traceroutes every s.T seconds
  • We refer to s as the sampling factor
  • Perform PING every t seconds when a Traceroute is
    not performed
  • Further, perform a Traceroute if change in
    hop/latency is observed

16
Cost-Accuracy Tradeoff
17
Defining capacity changes for a path
  • Pathrate gives an estimate of capacity (with some
    error)
  • Link-Mapping based change detection
  • Mapped result from Pathrate measurement to one of
    the several known link types
  • C1 If current link type is different from the
    previous link type
  • Percent-Change
  • C1 If current value is p or more different
    from the previous value
  • We use p10 for our analysis

18
Case counts
  • Averaged across all paths
  • C Change in Capacity R Change in Route

R C take same value in only 63 and 58 cases ?
Modest positive correlation
19
Cost-Accuracy Tradeoff
  • Link-Mapping

20
Cost-Accuracy Tradeoff
  • Percent-Change

21
Conclusions and Next Steps
  • Methodology for correlation quantification
  • Case counting
  • Cost-Accuracy tradeoff analysis
  • Hop Latency changes ? Route changes
  • Route changes ? Capacity changes
  • Promising results in both cases
  • Low cost measurements can be used to trigger high
    cost measurements
  • Further steps
  • Other correlations Capacity and Available
    Bandwidth correlation
  • Application level inaccuracy aka impact on E2E
    apps

22
Ongoing work
  • http//networking.hpl.hp.com/s-cube
  • Email s-cube_at_hpl.hp.com
Write a Comment
User Comments (0)
About PowerShow.com