Title: Server-based Characterization and Inference of Internet Performance
1Server-based Characterization and Inference of
Internet Performance
- Venkat Padmanabhan
- Lili Qiu
- Helen Wang
- Microsoft Research
- UCLA/IPAM Workshop
- March 2002
2Outline
- Overview
- Server-based characterization of performance
- Server-based inference of performance
- Passive Network Tomography
- Summary and future work
3Overview
- Goals
- characterize end-to-end performance
- infer characteristics of interior links
- Approach server-based monitoring
- passive monitoring ? relatively inexpensive
- enables large-scale measurements
- diversity of network paths
4Web server
ACKs
ACKs
DATA
clients
5Research Questions
- Server-based characterization of end-to-end
performance - correlation with topological metrics
- spatial locality
- temporal stability
- Server-based inference of internal link
characteristics - identification of lossy links
6Related Work
- Server-based passive measurement
- 1996 Olympics Web server study (Berkeley, 1997
1998) - characterization of TCP properties (Allman 2000)
- Active measurement
- NPD (Paxson 1997)
- stationarity of Internet path properties (Zhang
et al. 2001)
7Experiment Setting
- Packet sniffer at microsoft.com
- 550 MHz Pentium III
- sits on spanning port of Cisco Catalyst 6509
- packet drop rate lt 0.3
- traces up to 2 hours long, 20-125 million
packets, 50-950K clients - Traceroute source
- sits on a separate Microsoft network, but all
external hops are shared - infrequent and in the background
8Topological Metrics and Loss Rate
Topological distance is a poor predictor of
packet loss rate. All links are not equal ? need
to identify the lossy links
9Spatial Locality
- Do clients in the same cluster see similar loss
rates? - Loss rate is quantized into buckets
- 0-0.5, 0.5-2, 2-5, 5-10, 10-20, 20
- suggested by Zhang et al. (IMW 2002)
- Focus on lossy clusters
- average loss rate gt 5
Spatial locality ? there may be shared cause for
packet loss
10Temporal Stability
- Loss rate again quantized into buckets
- Metric of interest stability period (i.e., time
until transition into new bucket) - Median stability period 10 minutes
- Consistent with previous findings based on active
measurements
11Putting it all together
- All links are not equal ? need to identify the
lossy links - Spatial locality of packet loss rate ? lossy
links may well be shared - Temporal stability ? worthwhile to try and
identify the lossy links
12Passive Network Tomography
- Goal determine characteristics of internal
network links using end-to-end, passive
measurements - We focus on the link loss rate metric
- primary goal identifying lossy links
- Why is this interesting?
- locating trouble spots in the network
- keeping tabs on your ISP
- server placement and server selection
13Web server
Why is it so slow?
ATT
Sprint
CW
Earthlink
UUNET
Darn, its slow!
AOL
Qwest
14Related Work
- MINC (Caceres et al. 1999)
- multicast-based active probing
- Striped unicast (Duffield et al. 2001)
- unicast-based active probing
- Passive measurement (Coates et al. 2002)
- look for back-to-back packets
- Shared bottleneck detection
- Padmanabhan 1999, Rubenstein et al. 2000, Katabi
et al. 2001
15Active Network Tomography
S
A
B
Striped unicast probes
Multicast probes
16Problem Formulation
server
Collapse linear chains into virtual
links (1-l1)(1-l2)(1-l4) (1-p1) (1-l1)(1-l2)
(1-l5) (1-p2) (1-l1)(1-l3)(1-l8)
(1-p5) Under-constrained system of equations
l1
l3
l2
l8
l7
l6
l4
l5
p1
p2
p3
p4
p5
clients
171 Random Sampling
- Randomly sample the solution space
- Repeat this several times
- Draw conclusions based on overall statistics
- How to do random sampling?
- determine loss rate bound for each link using
best downstream client - iterate over all links
- pick loss rate at random within bounds
- update bounds for other links
- Problem little tolerance for estimation error
server
l1
l3
l2
l8
l7
l6
l4
l5
p1
p2
p3
p4
p5
clients
182 Linear Optimization
- Goals
- Parsimonious explanation
- Robust to estimation error
- Li log(1/(1-li)), Pj log(1/(1-pj))
- minimize ?Li ?Sj
- L1L2L4 S1 P1
- L1L2L5 S2 P2
-
- L1L3L8 S5 P5
- Li gt 0
- Can be turned into a linear program
server
l1
l3
l2
l8
l7
l6
l4
l5
p1
p2
p3
p4
p5
clients
193 Bayesian Inference
- Basics
- D observed data
- sj packets successfully sent to client j
- fj packets that client j fails to receive
- T unknown model parameters
- li packet loss rate of link i
- Goal determine the posterior P(TD)
- inference is based on loss events, not loss rates
- Bayes theorem
- P(TD) P(DT)P(T)/?P(DT)P(T)dT
- hard to compute since T is multidimensional
server
l1
l3
l2
l8
l7
l6
l4
l5
(s1,f1)
(s2,f2)
(s3,f3)
(s4,f4)
(s5,f5)
clients
20Gibbs Sampling
- Markov Chain Monte Carlo (MCMC)
- construct a Markov chain whose stationary
distribution is P(TD) - Gibbs Sampling defines the transition kernel
- start with an arbitrary initial assignment of li
- consider each link i in turn
- compute P(liD) assuming lj is fixed for j?i
- draw sample from P(liD) and update li
- after burn-in period, we obtain samples from the
posterior P(TD)
21Gibbs Sampling Algorithm
- 1) Initialize link loss rates arbitrarily
- 2) For j 1 burn-in for each link i
compute P(liD, li) where li is
loss rate of link i, and li ?j?i lj - 3) For j 1 realSamples for each link
i compute P(liD, li) - Use all the samples obtained at step 3 to
approximate P(?D)
22Experimental Evaluation
- Simulation experiments
- Internet traffic traces
23Simulation Experiments
- Advantage no uncertainty about link loss rate
- Methodology
- Topologies used
- randomly-generated 20 - 3000 nodes, max degree
5-50 - real topology obtained by tracing paths to
microsoft.com clients - randomly-generated packet loss events at each
link - a fraction f of the links are good, and the rest
are bad - LM1 good links 0 1, bad links 5 10
- LM2 good links 0 1, bad links 1 100
- Goodness metrics
- Coverage correctly inferred lossy links
- False positives incorrectly inferred lossy
links
24Simulation Results
25Simulation Results
26Simulation Results
High confidence in top few inferences
27Trade-off
Techniques Coverage False Positive Computation
Random sampling High High Low
LP Medium Low Medium
Gibbs sampling High Low High
28Internet Traffic Traces
- Challenge validation
- Divide client traces into two tomography set and
validation set - Tomography data set gt loss inference
- Validation set gt check if clients downstream of
the inferred lossy links experience high loss - Results
- false positive rate is between 5 30
- likely candidates for lossy links
- links crossing an inter-AS boundary
- links having a large delay (e.g. transcontinental
links) - links that terminate at clients
- example lossy links
- San Francisco (ATT) ? Indonesia (Indo.net)
- Sprint ? PacBell in California
- Moscow ? Tyumen, Siberia (Sovam Teleport)
29Summary
- Poor correlation between topological metrics
performance - Significant spatial locality and temporal
stability - Passive network tomography is feasible
- Tradeoff between computational cost and accuracy
- Future directions
- real-time inference
- selective active probing
- Acknowledgements
- MSR Dimitris Achlioptas, Christian Borgs,
Jennifer Chayes, David Heckerman, Chris Meek,
David Wilson - Infrastructure Rob Emanuel, Scott Hogan
- http//www.research.microsoft.com/padmanab