Title: LongRange Dependence in a Changing Internet Traffic Mix
1Long-Range Dependence in a Changing Internet
Traffic Mix
STATISTICAL and APPLIED MATHEMATICAL SCIENCES
INSTITUTE
J. S. Marron Department of Statistics and
Operations Research, UNC-Chapel Hill
Cheolwoo Park SAMSI
Félix Hernández-Campos Don Smith Department of
Computer Science, UNC-Chapel Hill
David Rolls Department of Mathematics and
Statistics, UNC-Wilmington
2Measurements
Capture TCP/IP packet headers on Gigabit Ethernet
link (inbound from Internet)
1 Gbps Ethernet
Internet
UNC
Monitor (tcpdump)
35,000 Internet Users
3Summary data
- Two-hour traces, 2nd week in April of 2002 and
2003 - 500 AM, 1000 AM, 300 PM, 930 PM on each of 7
days - 28 traces (56 hours) per year
- 2002 Traces
- 5 billion packets
- 1.6 terabytes of network traffic
- 95 TCP packets
- 5 UDP packets
- 93 TCP bytes
- 7 UDP bytes
- 10 max 2-hr. mean link utilization
- 0.01-0.16 packets dropped by monitor
- 2003 Traces
- 10 billion packets
- 2.9 terabytes of network traffic
- 75 TCP packets
- 25 UDP packets
- 86 TCP bytes
- 14 UDP bytes
- 18 max 2-hr. mean link utilization
- 0 packets dropped by monitor
4Hurst parameter (H) estimates and confidence
intervals
- H estimated from wavelet analysis tools
(logscale diagrams of D. Veitch) - H estimates for 2003 packet counts were
significantly lower than for 2002 (not true for
byte counts). - Several traces had H gt 1 or very wide confidence
intervals. - H estimates were independent of time of day or
day of week (both packets and bytes) in both
years.
5H not related to link utilization or active TCP
connections
6Extreme examples of H gt 1 or wide confidence
intervals
7Dependent SiZer analysis of wide CI example
- Test for statistically significant differences
from FGN process with parameters estimated from
data, H0.8 - Top local linear smoothing of data with
different window widths - Bottom statistical inference on trends of
smoothed curve at each window width
8Dependent SiZer analysis of H gt 1 example
- Analysis shows both non-linear trends and greater
variability than FGN process at many time scales
9Logscale diagram of typical 2002 and 2003 traces
- Protocol dependent analysis suggested by increase
in UDP - Filtered traces to create new traces TCP only
and UDP only - TCP is dominant influence in all cases except
2003 packet counts where UDP dominates. - Sharp increase at middle scales shapes H estimate
(less slope so lower H).
10Same conclusion for all traces.Why?
11The Blubster effect (2003s hot new peer-to-peer
file sharing application)
- Recall that UDP packets increased to 25 of 2003
packets (but only 14 of bytes). - Analysis of UDP packets found 70 from
application (Blubster) in 2003 that was
negligible in 2002. - Second filtering make Blubster-only and Rest
(TCP other UDP) traces. - Blubster alone dominated H estimate for packets,
not bytes
12Why?Blubsters packet traffic is periodic
- SiZer analysis of Blubster trace looking for
structure beyond white noise - Found high-frequency variability with periods in
1-5 second range (caused by update and search
queries among peers) - These correspond to the time-scales in logscale
diagram where UDP dominates the wavelet
coefficients.
13Results summary
- We presented results from a study of traffic on
the UNC Internet link from two years, 2002 and
2003. - A single application generating about 18 of
packets and lt 10 of bytes in traces can strongly
influence the H estimate (in this case, because
of periodic behavior). - A significant number of traces produced H
estimates gt1 or wide confidence intervals. - Dependent Sizer is an effective tool for
augmenting wavelet analysis and understanding
structure in Internet data. - H was not related to time-of-day, day-of-week,
link utilization, or number of active TCP
connections.