Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring

Description:

2006 Hewlett-Packard Development Company, L.P. ... GNP, Vivaldi, and other co-ordinate based systems. Correlation in latencies across paths ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 23

Provided by: tanjeet

Category:

more less

Transcript and Presenter's Notes

Title: Correlations in E2E Network Metrics: Impact on Large Scale Network Monitoring

1
Correlations in E2E Network Metrics Impact on
Large Scale Network Monitoring
Praveen Yalagandula Sung-Ju Lee Puneet
Sharma Sujata Banerjee

HP Labs, Palo Alto
http//networking.hpl.hp.com

2
Motivation

Large scale E2E network monitoring
Application management, Flow control, Fault
Diagnosis, etc.
A key question What granularity should we
measure?
Coarse-grained lower cost but
higher inaccuracy
Fine-grained lower inaccuracy but
higher cost
Observation Heterogeneity in measurement costs
PING lt TRACEROUTE lt PATHRATE
Our investigation
Are different E2E network metrics correlated?
Can we leverage such dependencies (if any) to
Lower monitoring cost while maintaining high
accuracies?

3
Our Approach

We consider two correlations in the current work
Changes in Hop and Latency ? Changes in Route
Changes in Route ? Changes in Capacity
We use data from S3 deployment on Planet-Lab
2years of data
E2E measurements Traceroute and Pathrate
(capacity)
On thousands of paths
Perform Cost vs. Accuracy analysis for two cases
Base Only higher cost measurements are performed
Strategy
Perform lower cost measurements
If change detected, perform higher cost
measurements

4
State-of-the-art

Correlations assumed by previous systems
GNP, Vivaldi, and other co-ordinate based systems
Correlation in latencies across paths
NetQuest
Correlation between hop changes and route changes
CoDeen
Correlation between route changes and capacity
Our work
Quantify the correlation
Perform accuracy vs cost tradeoff analysis

5
Outline

Motivation Quantify leverage metric
correlations
S3 Scalable Sensing Service
Deployment on PlanetLab
Correlations
Changes in Hop and Latency ? Changes in Route
Changes in Route ? Changes in Capacity
Cost-Accuracy Tradeoff Analysis
Summary and Future work

6
S3 Architecture

Sensor pods
Collection of sensors
Measure system state from a nodes view
Backplane
Programmable fabric
Connects pods and aggregates measured system
state
Inference Engines
Infers O(n2) E2E paths info by measuring few
paths
Schedules measurements on pods
Aggregates data on backplane
Applications

7
Sensor Pod
Configuration Data
SNMP Agent
Repository
Load
Memory
Secure Web Interface
Capacity
API query, control, and notification
Lossrate
Controller
Bandwidth
Latency
8
S3 Deployment on Planet-Lab

Running since January 2006
All pair network metrics
Latency Inferred by Netvigator
Lossrate Measured using Tulip lossrate tool
Available Bandwidth Measured using Spruce and
PathChirp
Capacity Measured using Pathrate
Stats14GB raw data every day, 1GB compressed

9
Two correlations quantified

Changes in hop and latency ? changes in route
(HL?R)?
PING can be used to measure both hops and latency
Original TTL - Remaining TTL value Num of hops
Change in number of hops will always means change
in the route
But does change in the route ? change in the
number of hops?
Obviously NO but how often how it affects
monitoring accuracy?
Changes in route ? changes in capacity (R?C)?
Capacity can change when route is not changed
CAP Limits
Especially in PlanetLab
Becoming common in other networks e.g., Cable
networks
Same route, but link upgraded or link-level
change not visible in IP route
Question
How often does this happen and how it affects
monitoring accuracy?

10
S3 Dataset

HL ? R
Use Traceroute measurements
Performed at each node to 20 landmark nodes
Landmark nodes (20) chosen across the globe
Performed once every 30 minutes
R ? C
Use Traceroute and Pathrate measurements
Each node performs Pathrate to all other nodes
In a round-robin fashion
Takes about a day (avg.) to complete a round of
measurements
We use Pathrate measurements iff (0 lt COV lt 1)

11
Defining metric changes

Route changes (R)
R1 If current route does not match previous
sample
Else R0
Some times routers do not respond in output
We ignore those hops during above route change
detection
Latency changes (L)
L1 If current latency is p or more different
than the previous sample
Else L0
We use p5 for this analysis
Hop changes (H)
H1 If current number of hops does not match
with the previous
H0 otherwise

12
Case counts

Averaged across all paths
H Change in hops L Change in Latency R
Change in route

13
Case counts
Measurements where route changed but neither hops
nor latency changed ? If we use changes in hops
and/or latency to detect route changes, we will
miss these

Averaged across all paths
H Change in hops L Change in Latency R
Change in route

14
Case counts
Overall, these two numbers are small ? changes
in hop and latency can be a good indicator of
changes in route

Averaged across all paths
H Change in hops L Change in Latency R
Change in route

15
Cost-Accuracy Tradeoff

What if we perform only PING and then perform
Traceroute only when a hop or latency change is
observed?
Reduces cost PING is relatively inexpensive
Increases inaccuracy Might miss some some route
changes
Base method Traceroutes every T seconds
Strategy
Perform Traceroutes every s.T seconds
We refer to s as the sampling factor
Perform PING every t seconds when a Traceroute is
not performed
Further, perform a Traceroute if change in
hop/latency is observed