Title: Characterizing and Predicting TCP Throughput on the Wide Area Network
1Characterizing and Predicting TCP Throughput on
the Wide Area Network
- Dong Lu, Yi Qiao,
- Peter Dinda, Fabian Bustamante
- Department of Computer Science
- Northwestern University
- http//plab.cs.northwestern.edu
2Overview
- Algorithm for predicting the TCP throughput as
function of flow size - Minimal active probing
- Dynamic probe rate adjustment
- Explaining flow size / throughput correlation
- Explaining why simple active probing fails
- Large scale empirical study
3Outline
- Why TCP throughput prediction?
- Particulars of study
- Flow size / TCP throughput correlation
- Issues with simple benchmarking
- DualPats algorithm
- Stability and dynamic rate adjustment
4Goal
- A library call
- BW PredictTransfer(src,dst,numbytes)
- Expected Time numbytes/BW
- Ideally, we want a confidence interval
- (BWLow,BWHigh) PredictTransfer(src,dst,numbytes,
p)
5Available Bandwidth
- Maximum rate a path can offer a flow without
slowing other flows - pathchar, cprobe, nettimer, delphi, IGI,
pathchirp, pathload - Available bandwidth can differ significantly from
TCP throughput - Not real time, takes at least tens of seconds to
run
6Simple TCP Benchmarking
- Benchmark paths with a single small probe
- BW ProbeSize/Time
- Widely used Network Weather Service (NWS) and
others (Remos benchmarking collector) - Not accurate for large transfers on the current
high speed Internet - Numerous papers show this and attempt to fix it
7Fixing Simple TCP Benchmarking
- Logs Sundharshan correlate real transfer
measurements with benchmarking measurements - Recent transfers needed
- Similar size transfers needed
- Measurements at application chosen times
- CDF-matching Swany correlate CDF of real
transfer measurements with CDF of benchmarking
measurements - Recent transfers still needed
- Measurements at application chosen times
8Analysis of TCP
- Extensive research on TCP throughput modeling in
networking community - Really intended to build better TCPs
- Difficult to use models online because of hard to
measure parameters - Future loss rate and RTT
- Note we measure goodput
9Our Measurement Study
- PlanetLab and additional machines
- Located all over the world
- Measurements of throughput
- Wide open socket buffers (1-3 MB)
- Simple ttcp-like client/server
- scp
- GridFTP
- Four separate sets of measurements
10Distribution Set
- For analysis of TCP throughput stability and
distributions - 60 randomly chosen paths among PlanetLab machines
- 1.6 million transfers (client/server)
- 100 KB, 200 KB, 400 KB, 10 MB flows
- 3000 consecutive transfers per pathflow size
11Correlation Set
- For studying correlation between throughput and
flow size, initial testing of algorithm - 60 randomly chosen paths among PlanetLab machines
- 2.4 million transfers, 270 thousand runs,
client/server - 100 KB, 200 KB, 400 KB, 10 MB flows
- Run sweep flow size for path
12Verification Set
- Test algorithm
- 30 randomly chosen paths among PlanetLab machines
and others - 4800 transfers, 300 runs, scp and GridFTP
- 5 KB to 1 GB flows
- Run sweep flow size for path
13Online Evaluation Set
- Test online algorithm
- 50 randomly chosen paths among PlanetLab machines
and others - 14000 transfers, scp and GridFTP
- 40 MB or 160 MB file, randomly chosen size
- 10 days
14Strong Correlation Between TCP Throughput and
Flow Size
Correlation and Verification Sets
15Why Does The Correlation Exist?
- Slow start and user effects Zhang
- Extant flows
- Non-negligible startup overheads
- Control messages in scp and GridFTP
- Residual slow start effect
- SACK results in slow convergence to equilibrium
16Why Simple Benchmarking Fails
Need more than one probe to capture correlation
Probes are too small
17Our Approach
Two consecutive probes, both larger than the
noise region
18Our Approach
- Two consecutive probes are integrated into a
single probe - 400KB, 800 KB in single 800 KB probe
Probe two
Probe one
T2
0
T1
19Our Approach
Flow size
Transfer Time
Solve For A and B
Predict Throughput For Some Other Transfer
20Model Fit is Excellent
Low and Normally Distributed Relative Errors At
All Flow Sizes
Correlation Set
21Stability
- How long does the TCP throughput function remain
stable? - How frequently should we probe the path?
- Whats the distribution of throughput around the
function (i.e., the error)?
22Throughput is Stable For Long Periods
Increasing Max/Min Throughput in Interval
Correlation Set
23Throughput Is Normally Distributed In An Interval
Distribution Set
24Online DualPats Algorithm
- Fetch probe sequence for destination
- Start probing process if no data exists
- Project probe sequence ahead
- 20 point moving average over values with current
sampling interval - Apply model using projected data
- Return result
- confidence interval computed using normality
assumptions
25Dynamic Sampling Rate
- Adjust sampling interval to correspond to the
paths stable intervals - Limit rate (20 to 1200 seconds)
- Additive increase / additive decrease of based on
difference between last two probes - lt 5 gt increase interval
- gt 15 gt decrease interval
26Finding Sufficiently Large Probe Size
- Default values 400 KB / 800 KB
- Upper bound
- Additive increase until prediction error are less
than threshold, all with same sign.
27Evaluation
- Slight conservative bias
- gt90 of predictions have lt 35 error
1
Pmean error lt X
Mean relative error
Mean abs(relative error)
0.4
-0.4
0
Relative error
Online Evaluation Set
28Conclusions
- Algorithm for predicting the TCP throughput as
function of flow size - Minimal active probing
- Dynamic probe rate adjustment
- Explaining flow size / throughput correlation
- Explaining why simple active probing fails
- Large scale empirical study
29For MoreInfo
- Prescience Lab
- http//plab.cs.northwestern.edu
- Aqua Lab
- http//aqualab.cs.northwestern.edu
- D. Lu, Y. Qiao, P. Dinda, and F. Bustamante,
Modeling and Taming Parallel TCP on the Wide Area
Network, IPDPS 2005 . - Y. Qiao, J. Skicewicz, P. Dinda, An Empirical
Study of the Multiscale Predictability of Network
Traffic, HPDC 2004.