Title: Network Performance Measurement and Analysis
1Network Performance Measurement and Analysis
- Outline
- Measurement
- Tools and Techniques
- Workload generation
- Analysis
- Basic statistics
- Queuing models
- Simulation
2Measurement and Analysis Overview
- Size, complexity and diversity of the Internet
makes it very difficult to understand
cause-effect relationships - Measurement is necessary for understanding
current system behavior and how new systems will
behave - How, when, where, what do we measure?
- Measurement is meaningless without careful
analysis - Analysis of data gathered from networks is quite
different from work done in other disciplines - Measurement/analysis enables models to be built
which can be used to effectively develop and
evaluate new techniques - Statistical models
- Queuing models
- Simulation models
3Determining What to Measure
- Before any measurements can take place one must
determine what to measure - There are many commonly used network performance
characteristics - Latency
- Throughput
- Response time
- Arrival rate
- Utilization
- Bandwidth
- Loss
- Routing
- Reliability
4Measurement Introduction
- Internet measurement is done to either
analyze/characterize network phenomena or to test
new tools, protocols, systems, etc. - Measuring Internet performance is easier said
than done - What does performance mean?
- Workload (what and where youre measuring)
selection is critical - Reproducibility is often essential
- Many tools have been developed to measure/monitor
general characteristics of network performance - traceroute and ping are two of the most popular
- These are examples of active measurement tools
- Passive tools are the other major category
- Representative and reproducible workload
generation will be a focus
5Active Measurement Tools
- Send probe packet(s) into the network and measure
a response - Ping RTT and loss
- Zing one way Poisson probes
- Traceroute path and RTT
- Nettimer (Lai) latest bottleneck bandwidth using
packet pair method - Pathchar per-hop bandwidth, latency, loss
measurement - Pchar, clink open-source reimplementation of
pathchar - Problem measurement timescales vary widely
Tn1 - Tn max(S/BW, T1 T0)
Size/BW
T1 T0
Tn1 Tn
6Passive Measurement Tools
- Passive tools Capture data as it passes by
- Logging at application level
- Packet capture applications (tcpdump) uses packet
capture filter (bpf,libpcap) - Requires access to the wire
- Can have many problems (adds, deletes,
reordering) - Flow-based measurement tools
- SNMP tools
- Routing looking glass sites
- Problems
- LOTS of data!
- Privacy issues
- Getting packet scoped in backbone of the network
7Workload Generation
- Local and/or wide area experiments often require
representative and reproducible workloads - How do we select a workload?
- Currently HTTP makes up the majority of Internet
traffic - Trace-based workloads
- Capture traces and replay them
- Black-box method
- Synthetic workloads
- Abstraction of actual operation
- May not capture all aspects of workload
- Analytic workloads
- Attempt to model workload precisely
- Very difficult
8SURGE Web Workload Generator
- Scalable URl Generator
- Analytic workload generator
- Based on 12 empirically derived distributions of
Web browsing behaviror - Explicit, parameterized models
- Captures heavy-tailed (highly variable)
properties of Web workloads - Widely used
- SURGE components
- Statistical distribution generator
- Hyper Text Transfer Protocol (HTTP) request
generator
9Workload characteristics captured in SURGE
BF
EF1
EF2
Off time
SF
Off time
BF
EF1
Characteristic Component Model System Impact
File Size Base file - body Lognormal
File System Base file - tail Pareto E
mbedded file Lognormal Single
file1 Lognormal Single file
2 Lognormal Request Size Body Lognormal
Network Tail Pareto
Document Popularity Zipf
Caches, buffers Temporal Locality Lognormal
Caches, buffers OFF Times Pareto
Embedded References Pareto ON
Times Session Lengths Inverse Gaussian
Connection times
10SURGE Architecture
SURGE Client System
ON/OFF Thread
ON/OFF Thread
SURGE Client System
LAN
ON/OFF Thread
Web Server System
SURGE Client System
11SURGE and SPECWeb96 exercise servers very
differently
Surge
SPECWeb96
12Analyzing Measured Data
- Analyzing measured data in networks is typically
done using statistical methods - Selecting appropriate analysis method(s) is
critical - Averaging
- Dispersion (variability)
- Correlations
- Regression analysis
- Distributional analysis
- Frequency analysis
- Principal-component analysis
- Cluster analysis
- Each form of analysis has strengths and weaknesses
13Self-Similar Nature of Network Traffric
- W. Leland, M. Taqqu, W. Willinger, D. Wilson, On
the Self-Similar Nature of Ethernet Traffic,
IEEE/ACM TON, 1994. - Baker Award winner
- V. Paxson, S. Floyd, Wide-Area Traffic The
Failure of Poisson Modeling, IEEE/ACM TON, 1995. - M. Crovella, A. Bestavros, Self-Similarity in
World Wide Web Traffic Evidence and Possible
Causes, IEEE/ACM TON, 1997.
14Queuing Models
- One of the key modeling techniques for computer
systems in general - Vast literature on queuing theory
- Nicely suited for network analysis
- Prof. Mary Vernon is our local expert
- Generally, queuing systems deal with a situation
where jobs (of which there are many) wait in line
for a resource (of which there are few) - Queuing theory can enable us to determine
response time - Examples?
15Queuing Models contd.
- Example packets arriving at a router how can
we determine how long it takes for packets to be
forwarded by the router? - Characteristics necessary to specify a queuing
system - Arrival process
- Service time distribution
- Number of servers
- System capacity (number of buffers)
- Population size
- Service discipline
- Kendal notation A/S/m/B/K/SD
- Response time waiting time service time
- For stability, mean arrival rate must be less
than mean service rate
16Littles Law
- One of the most basic theorems in queuing theory
(1961) - Mean number jobs in system arrival rate mean
response time - Treats a system as a black box
- Applies whenever number of jobs entering the
system equals number of jobs leaving the system - No jobs created or lost inside system
- Can be extended to include systems with finite
buffers - Example Average forwarding time in a router is
100 microseconds, I/O rate for packets is 100k.
What is the mean number of packets buffered in
the router?
17Simulation Models
- Simulation is one of the most common/important
methods of analysis/modeling - Typically an abstraction of the system under
consideration - Can provide significant insight to systems
behavior - Network simulation is difficult because of the
different layers of operation and the complexity
at each layer - Simulation options build your own, use someone
elses - Canonical network simulator is ns developed at
LBL - www.isi.edu/nsnam/ns
- ssf-net is a new, routing-enabled simulator