Statistical%20Based%20Anomaly%20Detection - PowerPoint PPT Presentation

About This Presentation

Title:

Statistical%20Based%20Anomaly%20Detection

Description:

Data are collected from various sources from the system ... Rarely used commands may be more discriminative than frequently used ones ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 32

Provided by: wwwnetC

Learn more at: http://www-net.cs.umass.edu

Category:

more less

Transcript and Presenter's Notes

Title: Statistical%20Based%20Anomaly%20Detection

1
Statistical Based Anomaly Detection

Based on the joint work of ATT, Rutgers and NISS
Presented by Jinghua Hu

2
Outline

Introduction
Statistics Based Anomaly Detection
Data Sets
Models
Summary
References

3
Introduction

Intrusion Detection
Misuse Detection
Anomaly Detection
Modeling Methodology
Rule based
Specification based
Profile based

4
Introduction

Profile Based Anomaly Detection
Pattern-based profiles
Fixed-length patterns (U. of New Mexico)
Variable-length patterns ( IBM Zurich )
Sequence-Match ( Purdue )
Statistics-based Profiles
SRI
ATT, Rutgers and NISS

5
SRI NIDES

Data are collected from various sources from the
system
Build frequency-based statistical profiles based
on long-term distributions for each features
Compare short-term distributions to the profiles
and get individual scores
Integrate individual scores into an overall
abnormality score

6
ATT, Rutgers and NISS

Data Source
Unix command sequences for 50 users, without
arguments or timestamps
Cut into fixed-length blocks
Block 1--50 clean data, for Training
Block 51--150 partly contaminated with other
users data blocks
Unit of analysis command blocks
Goals Detecting Masqueraders

7
ATT, Rutgers and NISS

Static Statistical Models
Frequency based model
Uniqueness model
Dynamic Statistical Models
Markov Chain
Hybrid High-order Markov
Other Methods
Data Compression
IPAM
Sequence-match( from Purdue)

8
Static Statistical Models

Frequency-based model
Built probability distribution of commands
Hypothesis
Commands are generated by the probability
distributions of the given user

9
Uniqueness Model

Motivation
Commands not previously seen in users training
data may indicate a masquerader
Rarely used commands may be more discriminative
than frequently used ones
Popularity of commands
Uniquely used used only by a single user
Unpopular used by few users

10
Uniqueness Model

Uniqueness Scores
Need command usage information of the whole group
of users for training
Assign an uniqueness ID to each command based
on the number of users in the group who have used
the command before

11
Uniqueness Model

Test Score
Check testing data command-by-command to compute
a test score
Incorporating multiple factors including
Uniqueness index of commands
Weights based on whether the command was seen
before in the given users training data
Command usage relative to other users

12
Markov Chain

Motivation
Dynamic models contain more information
Model transition probability of commands
Hypothesis
The one-step transition probabilities of the
commands conform to the state transition
probability matrix of the given user

13
Markov Chain

Test Statistic
Log-Likelihood Ratio
Problems with the standard Models
Large dimension of parameter space without enough
supporting data
Large computation efforts in training
Alternative Hypothesis
Reduced dimension

14
Markov Chain

Techniques for dimension reduction
Principal Component Regression
Choose linear combinations of user deviations
from state transition matrix that have maximum
variance and uncorrelated
Alternative Test Statistic
Fisher Score Statistics
Bayesian Factor

15
Hybrid High-order Markov Chain

Motivation
Multi-step transition may be more realistic in
modeling users behavior
High-order Markov Chain provide richer
information than 1-step Markov Chain
Test Statistics
Log-Likelihood Ratio

16
Hybrid High-order Markov Chain

Techniques for dimension reduction
Restrict attention to a subset of the most used
commands
MTD Represent high-order state transition
probability as the linear combinations of
one-step transition probabilities
For rarely used commands, Use statistical-independ
ence models instead of High-order Markov Chains

17
Data Compression

Motivation
Users tend to repeat part of their own history
activities
Test data appended to the historical training
data should compress more readily when the test
data comes indeed from the same user rather than
a masquerader

18
Data Compression

Test Score
Calculate the number of additional bytes needed
to compress the test data when appended to the
training data
Using Lempel-Ziv algorithm for data compression
An alarm is raised when the score is above a
threshold

19
IPAM

Incremental Probabilistic Action Modeling
Based on 1-step command transition probabilities
estimated from the training data
Use an exponential updating scheme to update the
transition probabilities
Prediction
Predict the next command by choosing the one
corresponding to the highest transition
probability

20
IPAM

Test Score
A prediction is labeled as good if the next
command turns out to be among the top-4 predicted
commands
The fraction of good predictions of the test data
forms the score
An alarm is raised when the score is below the
threshold

21
Sequence-Match

Based on pattern matching
Profiles represented by sequences of length N
Using a window sliding along testing data,
compute a similarity measure between the most
recent N commands and the profiles
The measure is the count of matches where
adjacent matches get higher weights
The maximum of all similarity values forms the
test score

22
Summary

Test Results
Masqueraders fall into two major groups
Either easy to detect by all methods
Or hard to detect and thus missed by all
False Alarm
Different methods prone to raise false alarms for
different users
False alarms often appear in consecutive blocks

23
(No Transcript)
24
(No Transcript)
25
Summary

Performance Comparison
Dynamic models work better than static models,
but not significantly
Uniqueness method dominates when 1-5 false
alarms range is desired
Compression Method seems to be inferior
Dynamic models are similar to pattern-based
profiles, with the additional information of
probabilities associated with transitions

26
(No Transcript)
27
Summary

Correlations between methods
Highly correlated Uniqueness and Hybrid
High-order Markov
Correlated IPAM and Sequence-Match
1-step Markov Chain can be associated with
either group
Data Compression stand alone

28
Corelations between Methods
Uniqueness
IPAM
Markov Chain
Hybrid High-order Markov
Sequence Match
Data Compression
29
Summary

Profile Updating
Profile updating helps the models to adapt to
profile shift
Uniqueness method and Hybrid High-order Markov
method benefit from updating
Detection methods can be vulnerable from updating
during testing
Compression method works better without updating

30
References

D. Anderson, T. F. Lunt, H. Javitz, A. Tamaru, A.
Valdes. Detecting Unusual Program Behavior Using
the Statistical Component of the Next-generation
Intrusion Detection Expert System (NIDES).
Computer Science Laboratory, SRI-CSL-95-06, May
1995
Theus, M., Schonlau, M., Intrusion Detection
Based on Structural Zeroes. Statistical Computing
Graphics Newsletter. Vol. 9, No 1, 12 - 17.
1998.
DuMouchel, W., Schonlau, M., A fast computer
intrusion detection algorithm based on hypothesis
testing of command transition probabilities.
KDD98, New York, pp. 189-193. 1998.
Schonlau, M., DuMouchel, W., Ju, W., Karr, A.,
Theus, M., Vardi, Y., Computer Intrusion
Detecting Masquerades, Statistical Science,
February 2001.

31
References

DuMouchel, W., Schonlau, M., A Comparison of Test
Statistics for Computer Intrusion Detection Based
on Principal Components Regression of Transition
Probabilities, Proceedings of the 30th Symposium
on the Interface Computing Science and
Statistics, 30, 404-413. 1999.
B. D. Davison and H. Hirsh. Predicting Sequences
of User Actions. In Predicting the Future AI
Approaches to Time Series Problems, pages 5-12.
AAAI Press, 1998.
Wen-Hua Ju and Yehuda Vardi, Profiling UNIX
Users And Processes Based on Rarity of Occurrence
Statistics with Applications to Computer
Intrusion Detection. RAID 2001.
Lane, T. and Brodley, C. E. Temporal Sequence
Learning and Data Reduction for Anomaly
Detection. In Proceedings of the Fifth ACM
Conference on Computer and Communications
Security, pp 150-158. 1998.