Title: Predicting and Bypassing EndtoEnd Internet Service Degradation
1Predicting and Bypassing End-to-End Internet
Service Degradation
- Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay
Mansour - Tel-Aviv University ATT Labs
Tel-Aviv University -
- Talk
- Omer Ben-Shalom
- Tel-Aviv University
2 Outline
- Degradation
- deviation from normal (minimum) RTT.
- Predicting Degradation
- Different Predictors
- Performance Evaluation
- Precision/recall methodology
- Suggested Application Gateway selection
3Motivating Application
AS 41
AS 123
Peering link
AS 56
Peering link
AS 12
- Gateway selection (Intelligent Routing device)
4Data and Measurements Sources
- Base Measurements from 4 different location (AS)
simulated 4 - gateway
- California (CA) ATT ACIRI
- New Jersey (NJ) ATT Princeton
5Data and Measurements Destinations
- Obtaining a representative sets of web servers
weights - (derived from proxy-log)
6Data and Measurements RTT
- Data Weekly RTT (SYN) ( End to End
(pathserver)) - Hourly measurements ? 35,124 servers
- Once-a-minute weighted sample measurements ? 100
servers
7Degradation Definition
- Deviation from minimum recorded RTT (propagation
delay) - Discrete degradation levels 1-6.
8Objective Avoiding degradation ?
- Attempt to reroute through a different gateway
- Two conditions have to hold
- Need to be able to predict the failure from a
gateway - Need to have a substitute gateway (low
correlation between gateways) - Blackout (consecutive degradation) through one
gateway
9Blackout durations
- Longer duration, easier to predict.
- Majority of blackouts are short 1-3 consecutive
points - However, considerable fraction occurs in longer
durations.
Long duration blackout
10Gateways Correlation
- Gateways are correlated but often the correlation
is not too strong
11Gateways Correlation
- Longer blackouts more likely to be shared
- failure closer to the server
- Majority of 2-gateways blackouts involved
same-coast pairs
12Building predictors
- For a given degradation level l.
- Prediction per IP.
- Input Previous RTT Measurements for the
IP-address. - Output probability for a failure
- Predict failure if probability gt ?
13Precision \ Recall Methodology
Predicted degraded
Actual degraded
14Precision-recall curve
- Sweep the threshold ? in 0,1 to obtain a
precision-recall curve. - In other words, let P(t) the predicted failure
probability at time t
15What is important for prediction?
- Recency principle
- The more recent RTTs are more important.
- Quantity Principle
- The more measurements the higher the accuracy.
16Recency Principle Importance
- Test case Single measurement predictor
- predict according to a measurement x-minute ago.
- observe the change in the quality of the
prediction. - ? 15 different between using the last minute
measurement or the 15 minutes ago measurement
17Quantity Principle Importance
- Test case Fixed-Window-Count (FWC)
- the prediction is the fraction of failures in
the W most recent measurements - ? By quantity we can achieve better precision
for high recall
FWC 1 FWC 5 FWC 10 FWC 50
18Our predictors
- Exponential Decay
- Polynomial Decay
- Model based Predictors
- VW-cover Variable Window Cover algorithm
- HMM Hidden Markov Model
19Exponential-decay predictors
- The weight of each measurement is exponentially
decreasing with its age by factor ?. - For consecutive measurements
-
- Binary variable ft represents a failure at
time t. - In general,
20Polynomial-decay predictors
- Exact computation required to maintaining the
complete history. - We approximated it.
21The VW-Cover predictor
- Consists of a list of pairs
- ( a1 , b1) ( a2 , b2 ) ( an , bn )
- Predict a failure if exist i such that there are
at least bi failures among previous ai
measurements
22VW-Cover predictor Building
- Build the predictor greedily to cover the
failures. - Use a learning set of measurements
- Pick ( a1 , b1 ) to be the pair which maximizes
precision - Pick ( ai , bi ) to be the pair which maximizes
precision among uncovered failures
23Hidden Markov Model
- Finite set states S (we use 3 states)
- Output probability as(0),as(1)
- Transition function, determines the probability
distribution of the next state. - The probability for a failure
- Where ps(t) is the probability to
- be at state s at time t. Ps(t) is
- updated according to the output
- of time t-1.
24Experimental Evaluation
25Predictor Performance Level 3
FWC10 FWC 50 ExpDecay 0.99 ExpDecay
0.95 VW-Cover HMM
? A recall 0.5 precision close to 0.9
26Predictor Performance Level 6
FWC10 FWC 50 ExpDecay 0.99 ExpDecay
0.95 VW-Cover HMM
- Degradation of level-6 are harder to predict
- recall 0.5 precision 0.4
27Predictor Performance Conclusion
- The best predictors in level 3 and 6 are
- VW-cover and HMM
- But they only slightly outperform ExpDecay0.95
which is considerable simpler to implement
28Gateway Selection
Level 6
Level 3
29Gateway Selection Conclusion
- Active gateway selection resulted in 50
reduction in the degradation-rate with respect to
best single gateway. - Static gateway selection can avoid at most 25 of
degradations. - Again ExpDecay0.95 only slightly under perform
the best predictor (VW-cover).
30Performance of gateway selection as a function of
recency
31Correlation between coast
- Gateway selection on same-coast pair resulted
only in 10 reduction. - Chose independent gateways
32Controlling prediction overhead
- Type of measurements
- Active measurements
- initiate probes (SYN,ping,HTTP request).
- Scalability problem.
- Passive measurements
- collected on regular traffic
- Controlling the prediction overhead
- Using less-recent measurements
- Active measurements only to small set of
destinations, which cover the majority of
traffic. - Cluster destinations. The measurements of one
destination can be used to predict another.
33- Questions ??
- natali_at_cs.tau.ac.il
- edith_at_research.att.com
- haimk_at_cs.tau.ac.il
- mansour_at_cs.tau.ac.il
-