Title: End-to-End Detection of Shared Bottlenecks
1End-to-End Detection of Shared Bottlenecks
- Sridhar Machiraju and Weidong Cui
- Sahara Winter Retreat 2003
2Problem Statement
- Given 2 end-to-end flows f1 and f2, do they share
a bottleneck (a congested link i.e., link with
packet drops) - (OR)
- Given 2 routes R1 and R2 on the Internet, do they
share a bottleneck link?
3Why is this hard?
- No information from the network
- Only information available delay and drops.
- Lots of noise delay from intermediate links and
drops on other links - Bottlenecks may change over time
4Why solve this problem?
- Overlays
- RON - Decide if rerouting flows bypasses
congestion points or not - RON Does such rerouting affect existing flows?
Which ones? - Cooperative overlays overlay does not want to
share bottleneck with a friendly overlay - OverQoS Useful to cluster together overlay
links based on shared bottlenecks
5Why solve this problem (cont.)?
- Other applications
- Massive backups of data from different servers
do them in parallel? - Content distribution is the use of multipath
going to improve performance? - Kazaa parallel downloads from peers
- Multihomed ASs can evaluate the orthogonality
in terms other than fault-tolerance
6Related Work
- Past work done only with Y or Inverted-Y
topologies using Poisson probes, packet pairs and
inter-arrival times.
Senders
Receivers
7Goals
- Provide a general solution for double-Y topology
- Work with multiple bottlenecks and provide an
indicator of shared congestion - Be able to use active probe flows and also
passively observed (TCP) flows - Complexity issues for clustering flows
8Motivation of Our Techniques
- Droptail queues TCP queues exhibit bursty
loss periods no losses - Queues build-up until bursty losses and decrease
in sizes before increasing again - Provides motivation for correlating periods of
drops and delays (proportional to queue sizes) - But
9Synchronization Lag
10Overview of Our Techniques
- We propose 2 techniques
- Probability Distribution (PD) technique
- Cross-Correlation (CC) technique
- PD is based on getting the peak of the discrete
probability distribution of, minimum time between
drop of a flow and drop of the other - CC is based on getting the maximum
cross-correlation assuming various synch. lags
11PD Technique
- For each dropped packet of a flow, plot PD of
minimum of the time differences between its
sending time and the sending times of dropped
packets of other flow - If shared bottleneck, we expect (ideally) a 1 at
d2- d1 All flows may not see drops during
same burst, so use threshold lt 1 for peak - We may see more than 1 drop in a burst cluster
drops into bursts and use time differences
between starts of bursts
12PD technique (contd.)
- Robustness issues synch. lag must be smaller
than the time difference between consecutive
drops of a flow
13Cross-Correlation (CC) Technique
- Key ideas
- Two back-to-back packets from two different
flows will experience similar packet drop/delay
at the bottleneck - If we can generate two sequences of
back-to-back packets from two different flows,
then we can calculate their cross-correlation
coefficient of losses or delays to measure their
similarity. - If the cross-correlation coefficient is greater
than some threshold, then the two flows share a
bottleneck.
14Questions about the CC Technique
- How to generate two sequences of back-to-back
packets? - UDP probes with a constant interval T
- average interval lt T/2
- Shift the sequence to overcome the synch. lag
- How long should the two sequences be to get a
significant result? - When the CC coefficient becomes relatively stable
- But no less than a minimum period of time
- What should the threshold be?
- Use 0.1 in the experiments
- Why 0.1?
15Overcome the Synchronization Problem
- Find the max cross-correlation by shifting one of
the two sequences within some range - The value of the optimal shift is an estimation
of the synchronization lag.
16Wide-Area Experiments
- Challenges
- Access to hosts distributed globally?
- How to verify our experimental results?
- Solutions
- PlanetLab (http//www.planet-lab.org)
- Set up an overlay network with double-Y topology
- Application-level routers monitor losses and
delays
17Topology with Shared Bottleneck (I)
Vancouver
Bologna
Seattle
Wisc
Atlanta
Sydney
18Topology without Shared Bottleneck (II)
Vancouver
Bologna
Seattle
Wisc
Atlanta
Sydney
19Experimental Setup
- Active Probing
- 40 bytes per packet
- Every 10ms
- Log packet arrival times on every node
- Also can get information of losses from these
logs - Traces from 10mins to 60mins
- Threshold 0.1 for the PD and CC techniques
20Overall Results
Exp Packet Drops Packet Drops PD Technique PD Technique Loss CC Technique Loss CC Technique Delay CC Technique Delay CC Technique
shared Non-shared Peak Value Est. Lag CC Coeff. Est. Lag CC Coeff. Est. Lag
1(20mins) 3 2096 lt 0.1 - lt 0.1 - lt 0.1 -
2(10mins) 6772 165 0.21 60ms 0.22 50ms 0.12 50ms
3(10mins) 2070 32 0.45 100ms 0.81 80ms lt 0.1 -
4(10mins) 81 2252 lt 0.1 - 0.38 -1.17s 0.99 -1.17s
5(30mins) 0 5565 lt 0.1 - lt 0.1 - lt 0.1 -
6(60mins) 10272 1127 lt0.1 - 0.23 6s lt 0.1 -
7(10mins) 1592 57 lt 0.1 - 0.75 -1.15 lt 0.1 -
8(10mins) 1895 112 0.11 180ms 0.55 300ms lt 0.1 -
Failed Cases
21Why the Delay CC Technique fails?
- Delay spikes at the non-shared part.
22Why the PD Technique fails?
- Large synchronization lag
- Few number of drops at the bottleneck
23Open Issues
- Parameter Selection
- What should the thresholds be?
- Active vs. Passive Probing
- Active probing waste network resources
- Passive probing cannot control the size/rate of
the probing sequences. - Multiple Bottlenecks
- Our techniques are not limited to the cases of
single bottlenecks. - But need more quantitative evaluations
- Probability of sharing a bottleneck
- How often should we generate probing sequence to
detect if two flows share a bottleneck? - Can we give a probability rather than a 0-1
decision?
24Conclusions
- Problem
- Detect if 2 end-to-end flows share a bottleneck
- Challenge
- Synchronization lag in double-Y topology
- Techniques
- The Probability Distribution Technique
- The Loss/Delay Cross-Correlation Technique
- Experimental Results
- The Loss CC technique succeeds with all
experiments - The Delay CC technique fails in some experiments
due to delay spikes at the non-shared part - The PD technique fails in some experiments due to
large synch. Lag and few number of losses at the
bottleneck