Title: A Signal Analysis of Network Traffic Anomalies
1A Signal Analysis of Network Traffic Anomalies
- Paul Barford
- with Jeffery Kline, David Plonka, Amos Ron
- University of Wisconsin Madison
- Summer, 2002
2Motivation
- Traffic anomalies are a fact of life in computer
networks - Outages, attacks, etc
- Anomaly detection and identification is
challenging - Operators typically monitor by eye using SNMP or
IP flows - Obviously, this does not scale!
- Simple thresholding is ineffective
- Some anomalies are obvious, other are not
- Characteristics of anomalous behavior in IP
traffic are not well understood - Do same types of anomalies have same
characteristics? - Can characteristics be effectively used in
detection systems?
3Introduction
- Objective Improve our understanding network
traffic anomalies - Approach Wavelet analysis of data set that
includes IP flow data, SNMP data and a catalog of
observed anomalies - Method Integrated Measurement Analysis Platform
for Internet Traffic (IMAPIT) - Results We demonstrate how anomalies can be
exposed using wavelets and develop new method for
exposing short-lived events
4Related Work
- Network traffic characterization
- Eg. Caceres89, Leland93, Paxson97, Zhang01
- Focus on typical behavior
- Abry98 use wavelets to analyze LRD traffic
- Fault and anomaly detection techniques
- Eg. Feather93, Brutlag00
- Focus on thresholds and time series models
- Eg. Paxson99
- Rule based tool for intrusion detection
- Eg. Moore01
- Backscatter technique can be used to identify DoS
attacks - Eg. Huang01
- Wavelet-based approach to detecting network
performance problems
5Simple Network Management Protocol
- SNMP is the standard protocol for
monitoring/managing networked systems - SNMP defines a set of MIB (management information
base) data exported from routers - RFC2863
- We sample High Capacity Interface using MRTG
(Multi-Router Traffic Grapher) at 5 minute
intervals - Archive byte and packet traffic in each direction
- 64-bit counters on each of 15 WAN links
- SNMP count precision is yet to be determined
6IP Flows
- An IP Flow is defined as a unidirectional series
of packets between source/dest IP/port pair over
a period of time - Exported by Lightweight Flow Accounting Protocol
(LFAP) enabled routers (Ciscos NetFlow, Juniper
cflowd flow export) - We use FlowScan Plonka00 to collect and
post-process IP flow data collected at 5 minute
intervals - Combines flow collection engine, database,
visulaization tool - Provides a near real-time visualization of
network traffic - Breaks down traffic into well known service or
application
SRC_IP/Port,DST_IP/Port,Pkts,Bytes,Start/End
Time,TCP Flags,IP Prot
7(No Transcript)
8Our Approach to Data Gathering
- Consider anomalies in IP flow and SNMP data
- Collected at UW border router (Juniper M10)
- Archive of 6 months worth of data (packets,
bytes, flows) - Includes catalog of anomalies (after-the-fact
analysis) - Group observed anomalies into four categories
- Network anomalies (41)
- Steep drop offs in service followed by quick
return to normal behavior - Flash crowd anomalies (4)
- Steep increase in service followed by slow return
to normal behavior - Attack anomalies (46)
- Steep increase in flows in one direction followed
by quick return to normal behavior - Measurement anomalies (18)
- Short-lived anomalies which are not network
anomalies or attacks
9Our Approach to Analysis
- Wavelets provide a means for describing time
series data that considers both frequency and
time - Particularly useful for characterizing data with
sharp spikes and discontinuities - More robust than Fourier analysis which only
shows what frequencies exist in a signal - Tricky to determine which wavelets provide best
resolution of signals in data - We use tools developed at UW which together make
up IMAPIT - FlowScan software
- The IDR Framenet software
10Our Wavelet System
- After evaluating different candidates we selected
a wavelet system called Pseudo Splines(4,1) Type
2. - A framelet system developed by Daubechies et al.
00 - Very good frequency localization properties
- Three output signals are extracted from input
- Low Frequency (L) synthesis of all wavelet
coefficients from level 9 and up - Mid Frequency (M) synthesis of wavelet
coefficients 6, 7, 8 - High Frequency (H) synthesis of wavelet
coefficients 1 to 5 - Thresholding (set to zero all coefficients whose
absolute value is below a threshold) is used on
these coefficients
11Ambient IP Flow Traffic
12Ambient SNMP Traffic
13Byte Traffic for Flash Crowd
14Average Packet Size for Flash Crowd
15Flow Traffic During DoS Attacks
16Byte Traffic During Measurement Anomalies
17Anomaly Detection via Deviation Score
- We develop an automated means for identifying
short-lived anomalies based on variability in H
and M signals - Compute local variability (using specified
window) of H and M parts of signal - Combine local variability of H and M signals
(using a weighted sum) and normalize by total
variability to get deviation score V - Apply threshold to V then measure peaks
- Our analysis shows that V peaks over 2.0 indicate
short-lived anomalies with high confidence - We threshold at V 1.25 and set window size to
3 hours
18Deviation Score for Three Anomalies
19Deviation Score for Network Outage
20Anomalies in Aggregate Signals
21Hidden Anomalies in Low Frequency
22Deviation Score Evaluation
- How effective is deviation score at detecting
anomalies? - Compare versus set of 39 anomalies
- Set is unlikely to be complete so we dont treat
false-positives - Compare versus Holt-Winters Forecasting
- Sophisticated time series technique
- Requires some configuration
- Holt-Winters reported many more positives and
sometimes oscillated between values
Total Candidate Anomalies Candidates detected by Deviation Score Candidates detected by Holt-Winters
39 38 37
23Conclusion and Next Steps
- We present an evaluation of signal
characteristics of network traffic anomalies - Using IP flow and SNMP data collected at UW
border router - 106 anomalies have been grouped into four
categories - IMAPIT developed to apply wavelet analysis to
data - Deviation score developed to automate anomaly
detection - Results
- Characteristics of anomalies exposed using
different filters and data - Deviation score is effective detection method
- Future
- Development of anomaly classification methods
- Application of results in (distributed) detection
systems