Title: Sensitivity of PCA for Traffic Anomaly Detection
1Sensitivity of PCA for Traffic Anomaly Detection
- Haakon Ringberg,
- Augustin Soule,
- Jennifer Rexford,
- Christophe Diot
- Princeton University,
- Thomson
2Outline
- Background and motivation
- Traffic anomaly detection
- PCA and subspace approach
- Problems with methodology
- Future directions
- Conclusion
3A network in the Internet
4Network anomalies
We want to be able to detect these anomalies!
5Network anomaly detectors
- Monitor health of network
- Real-time reporting of anomalies
6Principal Components Analysis (PCA) Benefits
- Finds correlations across multiple links
- Network-wide analysis
- Lakhina SIGCOMM04
- Demonstrated ability to detect wide variety of
anomalies - Lakhina IMC04
- Subspace methodology
- We use same software
7Principal Component Analysis
Coordinate transformation method
Original Data
X2
X1
8PCA on OD flows
x
- Each principal axis in the direction of maximum
(remaining) energy - Ordered by amount of energy they capture.
- Eigenflow set of OD flows mapped onto a
principal axis a common trend - Ordered by most common to least common trend.
- An OD flow is a weighted sum of eigenflows.
9Pictorial overview of subspace methodology
- Training separate normal anomalous traffic
patterns - Detection find spikes
- Identification find original spatial location
that caused spike (e.g. router, flow)
10Overview of issues with subspace methodology
- Defining normalcy can be challenging
- Tunable knobs
- Contamination
- PCAs coordinate remapping makes it difficult to
identify the original location of an anomaly
11Data used
- Géant and Abilene networks
- IP flow traces
- 21/11 through 28/11 2005
- Anomalies were manually verified
12Outline
- Background and motivation
- Problems with approach
- Sensitivity to its parameters
- Contamination of normalcy
- Identifying the location of detected anomalies
- Future directions
- Conclusion
13Sensitivity to topk
- PCA separates normal from anomalous traffic
patterns - Works because top PCs captures a large fraction
of variance - Most of the variance is due to periodic trends
14Sensitivity to topk
- Where is the line drawn between normal and
anomalous? - What is too anomalous?
15Sensitivity to topk
Very sensitive to number of principal components
included!
16Sensitivity to topk
- Sensitivity wouldnt be an issue if we could
tune topk parameter - Weve tried many different methods
- - 3s deviation heuristic
- - Cattells Scree Test
- - Humphrey-Ilgen
- - Kaisers Criterion
- None are reliable
17Contamination of normalcy
- What happens to large anomalies?
- They become part of the normalcy
- Therefore they are included among top PCs
- Pollutes definition of normal
- In our study, the outage to the left affected
75/77 links - Only detected on a handful!
18Identifying anomaly locations
- Spikes when state vector projected on anomaly
subspace - But network operators dont care about this
- They want the offending link, router, flow, etc
- How do we find the original location of the
anomaly?
19Identifying anomaly locations
- Previous work used a simple heuristic
- Associate detected spike with k flows with the
largest contribution to the state vector v - No clear a priori reason for this association
- Causes heavy hitter phenomena
20Outline
- Background and motivation
- Problems with approach
- Conclusion
- Addressable problems, versus
- Fundamental problems
- Future directions
21Conclusion addressable
- PCA is sensitive to its parameters
- More robust methodology required
- Training defining normalcy (topk, whichk)
- Detection tuning threshold
- Identification better heuristic
- Previous work used same data and optimized
parameter settings as Lakhina et al. - But these concerns might be addressable
22Conclusion fundamental
- We dont know what normal is
- Disconnect between objective functions
- PCA finds variance
- We seek periodicity
- PCAs strengths can be weaknesses
- Transformation good at detecting correlations
- Causes difficulty in identifying anomaly
location - Are other methods are more appropriate?
- We require a standardized evaluation framework
23Defining normalcy
- Large anomalies can cause a spike in first few
PCs - Diminishes effectiveness
- But we can presumably smooth these out (WMA)
- But first PCs arent always periodic
- whichk instead of topk?
- Initial results suggest this might be challenging
also
24Fundamental disconnect between objective functions
- PCA is optimal at finding orthogonal vectors
ordered by captured variance - But variance need not correspond to normalcy
(i.e. periodicity) - When do they coincide?
25Thanks!Questions?
- Augustin Soule, Thomson
- http//www.thlab.net/asoule