Title: Origins of Long Range Dependence Myths and Legends
1Origins of Long Range Dependence Myths and
Legends
- Aleksandar Kuzmanovic
- 01/08/2001
2Outline
- Definitions
- Why is LRD important?
- Heavy tails
- Producing self-similar traffic
- Physical interpretation in LAN and WAN networks
- Different hypothesis from around 10 papers
3On the Self-Similar Nature of Ethernet Traffic,
W. Willinger, 1994
4Definitions
- Long range dependent process
- if its autocorrelation function is nonsummable
- Self-similar process
- scaling behavior of finite dimensional
distributions - X(m(1-H))X(m) in distribution
- Second order self-similar process
- aggregated processes possess the same
non-degenerate AC functions as the original
process - X and (m(1-H))X(m) have the same AC function
- Self-similar processes have hyperbolically
decaying autocorrelation functions - LRD can be
characterized by a single parameter H
5Heavy tails (Noah effect)
- Heavy-tailed distributions
- LLCD
- Pareto a typical example
6Producing Self-Similar Traffic
- 1. Multiplexing ON/OFF sources that have a fixed
rate in ON periods and ON/OFF period lengths that
are heavy tailed. - Aggregate traffic is fBm with
- 2. queue model
- implies that multiplexing constant-rate
connections with Poisson connection arrivals and
a heavy-tailed distribution for connection
lifetimes would result in self-similar traffic - 3. Inter-arrival packet times are i.i.d. Pareto
with - and then consider the corresponding count process
(the number of arrivals in consecutive
intervals), we have pseudo self-similar traffic
(Paxson, Floyd) (or even self-similar (L.
Lipsky)?)
7Questions we want to answer
- What physical activity causes LRD?
- What is the role of protocols (TCP and MAC layer
protocols)? - What is the role of limited resources (i.e.
bandwidth)? - What model fits best to each of the assumptions?
- What is the largest time-scale over which the
correlation is present? - Self-similarity vs. pseudo self-similarity and
relevance
8Statistical Analysis of Ethernet LAN Traffic at
the Source Level, W. Willinger, 1997, I
9Statistical Analysis of Ethernet LAN Traffic at
the Source Level, W. Willinger, 1997, II
- Model 1 (heavy tailed ON/OFF activity at the
source level) is widely accepted - Result proven theoretically
- Noah effect (heavy-tailed periods)
- ON periods alpha 1.7
- OFF periods alpha 1.2
- TCP traffic measured most of the time...
- Higher load - H increases
- WAN measurements do not fit into this model
- connection typically do not stay long
10Wide Area Traffic The Failure of Poisson
Modeling, V. Paxson, S. Floyd, 1995
- Summary of ways to produce LRD traffic
- WAN (TCP) traffic for TELNET and FTP applications
- TELNET connection arrivals appear to be Poisson,
but packet arrivals are not - Single TELNET connection is LRD
- Model 3 Inter-arrival times are i.i.d. Pareto
- Aggregate is also LRD, but there is no analytical
proof () - FTP traffic also LRD, yet non of the models fit
because of limited resources. - Aggregated traffic is not fBm (single H is not
enough)
11Explaining WWW Traffic Self-Similarity, M.
Crovella, 1995
- WWW traffic is self-similar
- but only when load is high (i.e. in busiest
hours) - Authors force model 1 (ON/OFF model)
- The distribution of
- transfer times (alpha 1.21)
- user requests for documents (alpha 1.06)
- document sizes available in the Web (alpha
1.05) - user think times (alpha 1.5)
- H increases as the load increases (same as in LAN)
12On the Relationships betw. file sizes, tran.
prot. and s-s netw. traffic, M. Crovella, 1996
- Model 1 The success of this simple model is
surprising given that it ignores non-linarities
arising in real networks - Hypothesis
- Heavy tailed file size distributions together
with TCP is responsible for LRD - if UDP is used, there is little or no LRD
- Explanation
- In some sense, the effect of the unaccounted for
nonlinearity is reflected back as a stretching in
time effect, thus conforming to the models
original suppositions - Other interesting stuff mix of Pareto and exp.
background traffic
13On the Propagation of LRD in the Internet, A.
Veres, 2000, I
- Not about roots, but about propagation of
self-similarity by TCP - A(t) C - B(t)
- TCP is a linear system beyond a characteristic
time scale - if it adapts well to a background traffic, it
itself becomes self-similar
14On the Propagation of LRD in the Internet, A.
Veres, 2000, II
- Experimental proof
- NY-Budapest file transfer, source is not LRD -
traffic is LRD (H0.76) - Max time scale 8 min
- Also, if there is number of on-off TCP
connections, they can spread LRD - W. Willinger obviously does not like this paper
- This is a fraud and has no relevance for LRD
observed on link level... - Protocols have no impact on LRD, they just have
to send the data generated by applications...
15TCP Congestion Control and Heavy-Tails, M.
Crovella, 2000, I
- Switch to Model 3 (Heavy-tailed inter-packet
arrivals) - Although heavy-tailed flow lengths are commonly
associated with heavy-tailed file sizes, there is
no strong correlation between file sizes and
transmission times - It has been shown that TCP can show heavy-tailed
inter-arrival times under some - conditions
- Because most of the
- connections are short
- lived (!) only slow start
- and exp. back-off were
- considered
16TCP Congestion Control and Heavy-Tails, M.
Crovella, 2000, II
- Simple Markov chain model for exp. backoff and
slow start with pr. of loss parameter - State probability with different loss rates
- For alpha to be
- between 1 and 2,
- p has to be between
- 1/8 and 1/4
- ...but for different model
- p increases gt
- H increases
17TCP Congestion Control and Heavy-Tails, M.
Crovella, 2000, III
- Pathological TCP connections 15 packets
- Analytical model not that good (borders are
loose) - For this set-up, correlation up to 1000 sec
- For larger file sizes, up to 200-300 sec
- Under certain conditions, heavy tailed
transmission times can occur even in the absence
of any variability in file sizes - Future work to consider the variability in
round-trip time estimation
18On the Autocorrelation Structure of TCP Traffic,
Don Towsley, 2000, I
- Answer to previous two papers
- TCP can create self-similarity but over finite
range of time scales - pseudo self similarity - but everything in nature is finite (thus
pseudo) - Also criticize pathological model of previous
paper, but they themselves use pathological model
of different kind (always packets model) - Separate Markovian models for Congestion
avoidence (CA) and Time Out (TO) models - Simulated these two models with different loss
probability parameters
19On the Autocorrelation Structure of TCP Traffic,
Don Towsley, 2000, II
- Range of time scales observed from the simulation
(26RTT(2.5 to 10)) gt 29RTT - Explanation on why aggregate is self-similar
- independent bottlenecks (at the edge)
- aggregate of independent pseudo-self-similar
flows should be self-similar itself ()
20On the Autocorrelation Structure of TCP Traffic,
Don Towsley, 2000, III
- !About Veres paper
- compute loss probability (0.08 to 0.14)
- TO model predicts H0.69-0.72 (really measured
0.74) - Time scale goes up to 26 RTO (also near measured
value) - Experiments (file transfers)
- North-South America
- Measurements p 0.13, H 0.77, ts (27 to
28)RTT - TO model p 0.12, H 0.72, ts (27 to
29)RTT - East - West Coast
- Measurements p 0.018, H 0.86, ts 26RTT
- CA model p 0.018, H 0.75, ts
24RTT - One should be careful when attributing the origin
of traffic characteristics to a specific cause
21Protocols Can Make Traffic Appear Self-Similar,
Jon Peha, 1997. I
- How basic retransmission mechanism can cause
self-similarity - No model, only experimental investigation
- Simple single queue (bottleneck) model
- Input traffic - Poisson retransmissions are
bursty - As time-scale gets larger, burstiness from
original Poisson traffic decreases, but
burstiness from retransmissions stays the same! - Unlikely that traffic from retransmission
mechanism cause truly self similar traffic,
rather pseudo self-similarity -
22Protocols Can Make Traffic Appear Self-Similar,
Jon Peha, 1997. II
23Protocols Can Make Traffic Appear Self-Similar,
Jon Peha, 1997. III
- Cut-off time scales observed
- 150Mbps link rate, 500 bits packets, RTT 60 msec
- TS 5 minutes
- 10Mbps Ethernet, No. of retransmissions5, To125
- TS in range of minutes
- For larger To, it is possible to reach time
scales measured at Bellcore - I have computed cut-off time-scale for Veres
paper - 128 Kbps, Tout10RTT2 sec, TS8min
- If this effect is found to be as strong in more
complex models, this could be a significant cause
24The Second-order Characteristics of TCP,
J.Y.Boudec, 1996, I
- Pseudo self similarity (TS20-30 sec)
- Minimum bottleneck bandwidth 34Mbps (?)
- Two main reasons (both heavy-tailed)
- Burst length arrivals
- Round trip time
- Real network measurements
- Figure - missing
25The Second-order Characteristics of TCP,
J.Y.Boudec, 1996, II
- Even for 34Mbps link and utilization of 25, the
arrival bursts are eliminated and the inter
packet times are dependent on the round trip
times - The aggregate of TCP connections have the same H
as a single TCP connection () - It seems likely that the heavy tailed
distributions observed in Willingers work were a
result of, among other things, the heavy tailed
distribution of a round trip time
26More on RTTs
- Why are round trip times heavy-tailed?
- Because of TCP congestion control?
- Because of retransmissions?
- Because of variety of destinations?
- It can be heavy-tailed even without any
congestion protocol or different destinations! - Measurement and Analysis of LRD Behavior of
Internet Packet Delay, M. Borella, Infocom 97 - Constant UDP transmissions - LRD response
- Is cross-traffic heavy-tailed?
- Or multiple bottlenecks assumption?
- Simple example (not through bandwidth adaptation,
but through RTT adaptation)
27Summary
- Heavy-tailed parameters
- File sizes
- Connection life-times
- Inter-arrival packet times
- Document sizes available in the web
- User think times
- TELNET packet arrivals
- Round trip times
- Pseudo self-similarity
- it should be clear that the range of time scales
covered is far beyond dominant time scales, and
as long as packet loss is concerned, this is
relevant
28Conclusions
- One should be careful when attributing the origin
of traffic characteristics to a specific cause - There is more than one physical activity causing
LRD - Protocols (TCP) influence is more than relevant
- Time scales covered are relevant in both
generation, time-stretching and propagation
hypothesis - Model 3 (inter-arrival times i.i.d. Pareto) plus
heavy-tailed file sizes (introducing congestion)
is promising - Analytical proof for aggregate is missing
(simulation proof reported in 3 papers) - Round-trip times hypothesis might be promising -
supports Veres idea in a slightly different way