Title: Internet Traffic Characterization
1Internet Traffic Characterization
2What is covered in this talk
- Why characterize Internet traffic ?
- Measurement and analysis methodologies.
- Measurement studies.
- Variation of Internet traffic (time of day, day
of week effects) - Packet level characteristics (packet sizes).
- Flow level characteristics (Flow sizes, flow
durations). - File size distributions.
- Distribution by application.
- Distribution by protocol.
3What is not covered
- Everything that will be covered in future
presentations !! - Delay and loss measurements
- TCP related measurements (TCP flavors etc)
- Self similarity of Internet traffic
- Flow measurements
- Peer to peer traffic measurements
4Goals of this research..
- Observe Internet traffic characteristics.
- Develop reasonable models to understand these
characteristics. - Failure of traditional mathematical modeling
techniques (e.g. Queueing theory). - Earlier models deal with issues which are
non-critical from the practitioners point of
view. - Attempt to close the void between theory and
practice.
5Why Characterize Internet Traffic ?
- Provisioning network resources (capacity, buffer,
etc) - How should the network be provisioned to satisfy
certain constraints. - Constraints may differ with the type of traffic.
- E.g. Buffer provisioning
- Current tools (eg SNMP) may not be sufficient
- Analyzing network performance
- TCP performance
- Routing performance
6Why Characterize Internet Traffic ?
- Obtain characteristic workloads for use in
simulations - Typical packet sizes
- Typical flow durations
- Most commonly used TCP flavors
- Important for ISPs to formulate policy decisions
(Service Level Agreements) - Developing techniques to detect network anomalies
e.g. Denial of Service attacks. - Verify rule of thumb type design guidelines.
7Measurement Methodologies
- Objectives of a monitor
- Collection of detailed traffic statistics from
heterogeneous network links. - Non-interference with the measured network
(non-intrusiveness). - Obtaining a global view of the monitored network
from a reasonable number of monitoring points. - Types of monitor
- Active monitors
- Passive monitors
8IPMON (Sprint)
- Passive monitor for the Sprint backbone network.
- Capable of monitoring links of capacities ranging
from OC-3 to OC-48. - Uses an optical splitter on the monitored link.
- Records packet traces including IP and TCP/UDP
headers, timestamp. - Trace sanitizer.
- Analysis component
- Flow statistics (start and end time of flows,
flow sizes) - Protocol (TCP, UDP) and application (web, email,
streaming) split of traffic.
9IPMON
10Other Projects
- OC3MON (MCI) - Passive monitor designed for OC3
links (155 Mbps). - NetScope (ATT) - A set of tools for traffic
engineering in IP backbone networks. - Network Analysis Infrastructure (NAI) -
Performance of vBNS (very high speed Backbone
Network Service) and Abilene networks. - Some routers have built-in monitoring
capabilities. - Netflow Cisco routers.
- Commercial tools
- Niksuns NetDetector and NikScouts ATM Probes.
11Measurement Studies
- Wide Area Internet Traffic Patterns and
Characteristics Thompson, Miller, Wilder, MCI
Telecommunications, 1997. - One of the first studies of commercial backbone
traffic. - Used the OC3MON traffic monitor described
earlier, at two locations on MCIs commercial
backbone. - Characterize traffic on timescales of 24hrs and 7
days in terms of traffic volume, flow volume,
flow duration, packet sizes, traffic composition
(by protocol, application). - Two links monitored. Domestic and International.
12MCI Study Daily and weekly effects
- Traffic volume shows a clear diurnal pattern,
with traffic tripling from 0600 through 1200
noon EDT. - Traffic decreases by about 25 during the
weekend. - The two directions of the monitored link are not
symmetric.
13MCI Study Asymmetry in packet sizes
- Packet sizes are different in the two directions,
and are roughly inversely proportional to each
other.
14MCI Study Packet size distributions
- Packet size distributions are trimodal.
- 40-44 bytes - TCP ACKs, control segments etc.
- 552 or 576 bytes - Default MSS when MTU Discovery
is not used is 512 or 536 bytes. - 1500 bytes MTU for Ethernet.
15MCI Study International Link Traffic
- International link traffic shows similar time of
day, day of week effects. - Packet sizes in the two directions are asymmetric
Larger packets in the U.S. to U.K. direction.
16MCI Study Protocol and Application Mix
- Protocol composition
- TCP dominates (95 of bytes, 90 packets, 75
flows) - UDP second (5 bytes, 10 packets, 20 flows)
- ICMP most of the remaining.
- Application composition
- Web (75 bytes, 70 packets, 75 flows)
- Other (may also be web-related)
- DNS (1 bytes, 3 packets, 18)
- SMTP (5 bytes, 5 packets, 2 flows)
- FTP (5 bytes, 3 packets, lt1 flows)
- NNTP (2 bytes, lt1 packets, lt1 flows)
- Telnet (lt1 bytes, 1 packets, lt1 flows)
17Measurement Studies
- Trends in Wide Area IP Traffic Patterns
McReary, Claffy, CAIDA, 2000. - Data collected by the NAI project from May 1999
through March 2000 at the NASA Ames Internet
Exchange. - Analysis of packet size distributions,
protocol/application mix etc. - Show increasing trends in traffic from new (at
that time) applications e.g. streaming media,
online games, Peer to Peer (Napster). - No change in the overall trend in the TCP/UDP
traffic ratio as compared to the analyses at MCI
and CAIDA in 1998.
18CAIDA Study Packet Size Distributions
- Packet size distributions show same trimodal
trend as previous results.
19CAIDA Study Protocol and Application Mix
- Protocol mix
- TCP and UDP are still the most popular protocols,
and in roughly the same proportions. - Application mix (TCP)
- Web is still the most popular application
- New applications like peer to peer file sharing
(Napster) now appear in the list. (Napster at 5th
position) - Application mix (UDP)
- Streaming media (RealAudio) now comprises a
substantial portion of total UDP traffic. - Online games (Half Life, EverQuest, Unreal, Quake
3) also have substantial share. -
20CAIDA Study Long Term Trends
- The protocol mix of the traffic (TCP and UDP)
does not change significantly over time. - Decline in the contribution of FTP to the overall
traffic mix. - Possibly due to shift from active to passive mode
FTP, because of an increase in packet filtering
firewalls. - Alternate protocols for file transfer.
- Decline in the fraction of RealAudio traffic.
- RealAudio traffic has remained fairly constant,
while other traffic has increased. - Decline in the fraction of game traffic
21CAIDA Study Long Term Trends
- Significant increase in peer to peer traffic
(Napster)
22CAIDA Study Short Term Trends
- Email traffic increased significantly in November
and early December, decreasing after December
holidays.
23CAIDA Study Short Term Trends
- Online gaming shows day of week effects, with
traffic nearly doubling over weekend periods.
24Measurement Studies
- Longitudinal study of Internet traffic from
1998-2001 Fomenkov, Keys, Moore, Claffy, CAIDA,
2001. - Unique long term view of Internet traffic.
- Multiple observation sites (20)
- Four metrics of measured traffic
- Number of bytes.
- Number of packets.
- Number of flows.
- Number of source-destination pairs (port number
and protocol fields ignored). This measures the
number of Internet hosts communicating via the
monitored link.
25Longitudinal Study
- Bit and packet rates show diverse behavior
- Some sites show sustained growth, some are
constant and some fluctuate between growth and
reduction. - No clear diurnal pattern in the measured traffic
! - No consistent long term growth Refutes the
notion that Internet traffic ic universally and
rapidly increasing. - Usage patterns
- Traffic composition varies significantly from
site to site. - WWW traffic reached maximum between late 1999 and
early 2000. - Has been constant or decreased since.
- This could be due to the onset of noticeable
amounts of P2P traffic. -
26Longitudinal Study Application Mix
27Measurement Studies
- Packet Level Traffic Measurements from the Sprint
IP Backbone Fraleigh, Moon, Lyles, et al.
Sprint Labs, 2003 - Most recent (2001-2002) study of traffic on a
commercial backbone link. - Analyses the impact of new applications
(distributed file sharing, streaming media) - New results for end-to-end loss and delay
performance of TCP connections. - Measurements of network delays in the backbone
and U.S. transcontinental links. - Methodology Uses the IPMON architecture
described earlier.
28SPRINT Study Traffic Load
- Traffic load in bytes
- SNMP is not able to capture the burstiness of the
traffic at smaller timescales. - Most backbone links are utilized under 50. Less
than 10 of the backbone links experience
utilization higher than 50 in any 5 min
interval. - Noticeable peaks in traffic load are observed due
to DoS attacks. - Traffic in a bidirectional link is asymmetric.
- Many applications are inherently asymmetric.
- Hot potato routing.
29SPRINT Study
- SNMP is not able to capture the burstiness of the
traffic at smaller timescales.
30SPRINT Study Application Mix
- Application mix varies from link to link.
- In most cases, web represents more than 40 of
total traffic (As seen in previous studies) - However, on some links, the web contributes less
than 20, while P2P accounts for 80. - Streaming applications are a stable component of
the traffic.
31SPRINT Study - Flows
- The number of flows and the traffic load are not
necessarily correlated. i.e a large number of
flows does not always mean a large traffic load.
32Measurement Studies Flow level
- Understanding Internet Traffic Streams
Dragonflies and Tortoises Brownlee, Claffy
CAIDA. - Results of flow level measurements from two
links OC3 link (Auckland) and OC12 link (UCSD) - Uses an extension of NeTraMet to monitor stream
lifetimes. - Previous classifications of flows were on basis
of size (packets or bytes) - Elephants (large transfers)
- Mice (short transfers)
- Propose alternate classification of TCP flows on
basis of their lifetime. - Tortoises (long lasting transfers)
- Dragonflies (short duration transfers)
- Here flows are defined as sets of packets
traveling in either direction between a pair of
end-points.
33Dragonflies and Tortoises
- Percentages of streams and bytes.
- Long Running (LR) streams (gt15 mins) account for
about 1 of the streams. - Very Short streams (lt2 sec) account for 40 70
of streams, showing a diurnal pattern of
variation. - At UCSD site, 50 of all bytes were in LR
streams, while this fraction was 5 for Auckland.
Most of these streams are non-web traffic.
34Short Streams Streams lasting less than 15 mins
- Lifetime distributions
- 45 of streams have lifetimes less than 2 sec.
- Distributions do not change rapidly over time.
35Short Streams Streams lasting less than 15 mins
- Byte size distributions
- Short stream size distributions for UDP, non-web
TCP and web TCP are considerably different. - Distributions are stable over long periods of
time
36Tortoises Streams lasting more than 15 mins
- Bit rates
- Longer duration LR streams are low-rate
(interactive) or high rate (multimedia) with
approximately equal frequency. - Medium duration LR streams tend to be high-rate.
(file transfers) - UDP streams run at constant bit rates, but these
rates may change in response to the applications
state (online games).
37Tortoises Streams lasting more than 15 mins
- LR stream lifetimes
- LR stream lifetimes seem to follow a power law
distribution.
38Measurement Studies Flow level
- Internet Stream Size Distributions Brownlee,
Claffy, CAIDA 2002. - Measurements of
- Per minute distributions of stream sizes in bytes
for a period of one hour. - Two different types of traffic considered Web
traffic, and non-web TCP traffic. - Web streams
- 87 under 1kB, 8 between 1 and 10 kB, 4.8
between 10 and 100 kB. - Non-web streams
- 89 under 1kB, 7 between 1 and 10 kB, 1.5
between 10 and 100 kB.
39Internet Stream Size Distributions
40File Size Distributions
- The Structural cause of file size distributions
Downey, 2001. - A new model for the operations that create new
files. - Files appear because of common operations.
- Copying.
- Translating and filtering.
- Editing.
- Using this, the distribution of file sizes can be
predicted to be lognormal. - Start with a single file of size s.
- Select a file size s at random from the current
distribution. - Create a new file with size fs and add to the
distribution. (f is a factor chosen from some
other distribution. - Hence size of nth file is sn s f1 f2
f3..fm - log(sn) log(s) log(f1) .
41File Size Distributions
- File sizes on web servers
- Studies by Arlitt and Williamson claim file size
match the Pareto model. - This may not be true !!
- Some of the analyzed data sets better fit the
lognormal model. - Traces of downloaded files.
- Fits a hybrid model with lognormal distribution
with a Pareto tail. - Two mode lognormal model is also a good match.
- Summary The distribution of file sizes is NOT
heavy tailed ! - Implications on self-similarity of Internet
traffic - Most explanations assume that distribution of
file sizes is long-tailed. - Need to revise explanations of self-similarity.
42Non-commercial networks
- Some results from the abilene network during the
duration of one week. - Application mix
- Web traffic is much lower as compared to
commercial backbone networks. - Email traffic is higher.
- Measurement traffic amounts to 5 of all traffic
!! - Protocol mix
- TCP is still the most dominant (90 of bytes).
- UDP accounts for 5.
- ICMP around 4.
- Numbers similar to that on commercial backbone
links.
43Future Directions
- Self-similarity The need to verify assumptions.
- Downey questioned the assumptions about file size
distributions. - Inter-arrival time distributions.
- Transfer length distributions.
- Burst size distributions.
- Dependence of traffic characteristics on TCP
algorithms. - Measurement based forecasting of DoS attacks and
flash crowds. - Real time monitoring of critical parameters. Use
this characterization to automatically make
decisions. - Provisioning.
- Routing etc.
44Future Directions
- Characterization of P2P traffic.
- Previous measurement studies on P2P systems
focused on node behavior, topology etc. - Need to better characterize the traffic generated
by P2P applications.
45