Learning to Detect Computer Intrusions with Extremely Few False Alarms presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning to Detect Computer Intrusions with Extremely Few False Alarms

1
Learning to Detect Computer Intrusions with
(Extremely) Few False Alarms

Jude Shavlik

Mark Shavlik
2
Two Basic Approaches for Intrusion Detection
Systems (IDS)

Pattern Matching
If packet contains site exec and then sound
alarm
Famous example SNORT.org
Weakness Dont (yet) have patterns for new
attacks
Anomaly Detection ?
Usually based on statistics measured during
normal behavior
Weakness does anomaly intrusion ?
Both approaches often suffer from too many false
alarms
Admins ignore IDS when flooded by false alarms

3
How to Get Training Examplesfor Machine Learning?

Ideally, get measurements during
Normal Operation vs.
Intrusions
However, hard to define space of possible
intrusions
Instead, learn from positive examples only
Learn whats normal and define all else as
anomalous

4
Behavior-Based Intrusion Detection

Need to go beyond looking solely at external
network traffic and log files
File-access patterns
Typing behavior
Choice of programs run
Like human immune system, continually monitor and
notice foreign behavior

5
Our General Approach

Identify unique characteristics of each
user/servers behavior
Every second, measure 100s of Windows 2000
properties
in/out network traffic, programs running,keys
pressed, kernel usage, etc
Predict Prob( normal measurements )
Raise alarm if recent measurementsseem unlikely
for this user/server

6
Goal Choose Feature Space that Widely
Separates User from General Population
Choose separate set of features for each user
Specific User
General Population
Possible Measurements in Chosen Space
7
What Were Measuring (in Windows 2000)

Performance Monitor (Perfmon) data
File bytes written per second
TCP/IP/UDP/ICMP segments sent per second
System calls per second
of processes, threads, events,
Event-Log entries
Programs running, CPU usage, working-set size
MS Office, Wordpad, Notepad
Browsers IE, Netscape
Program development tools,
Keystroke and mouse events

8
Temporal Aggregates

Actual Value Measured
Average of the Previous 10 Values
Average of the Previous 100 Values
Difference between Current Value and Previous
Value
Difference between Current Value and Average of
Last 10
Difference between Current Value and Ave of Last
100
Difference between Averages of Prev 10 and Prev
100

9
Using (Naïve) Bayesian Networks

Learning network structure too CPU-intensive
Plus, naïve Bayes frequently works best
Testset results
59.2 of intrusions detected
About 2 false alarms per day per user
This papers approach
93.6 detected
0.3 false alarms per day per user

10
Our Intrusion-Detection Template
Last W (window width) measurements
...
X
X
X
time (in sec)
If score(current measurements) gt T then raise
mini alarm If mini alarms in window gt N
then predict intrusion
Use tuning set to choose per user good values for
T and N
11
Methodology for Training and Evaluating
Learned Models
. . .
Alarm from Model of User X ?
yes
Alarm from Model of User Y ?
False Alarm
yes
Intrusion Detected
12
Learning to Score Windows 2000 Measurements (done
for each user)

Initialize weights on each feature to 1
For each training example do
Set weightedVotesFOR 0 weightedVotesAGAINST
0
If measurement i is unlikely (ie, low prob)
then add weighti to weightedVotesFOR else
add weighti to weightedVotesAGAINST
If weightedVotesFOR gt weightedVotesAGAINST
then raise mini alarm
If decision about intrusion incorrect,
multiply weights by ½ on all measurements
that voted incorrectly (Winnow algorithm)

13
Choosing Good Parameter Values

For each user
Use training data to estimate probabilities and
weight individual measurements
Try 20 values for T and 20 values for N
For each T x N pairing
compute detection and false-alarm rates
on tuning set
Select T x N pairing whose
false-alarm rate is less than spec (e.g., 1 per
day)
has highest detection rate

14
Experimental Data

Subjects
Insiders 10 employees at Shavlik Technologies
Outsiders 6 additional Shavlik employees
Unobtrusively collected data for 6 weeks
7 GBytes archived
Task Are current measurements from user X?

15
Training, Tuning, and Testing Sets

Very important in machine learning to not use
testing data to optimize parameters!
Can tune to zero false alarms and high detection
rates!
Train Set first two weeks of data
Build a (statistical) model
Tune Set middle two weeks of data
Choose good parameter settings
Test Set last two weeks of data
Evaluate frozen model

16
Experimental Results on the Testset
17
Highly Weighted Measurements( of time in Top
Ten across users experiments)

Number of Semaphores (43)
Logon Total (43)
Print Jobs (41)
System Driver Total Bytes (39)
CMD Handle Count (35)
Excel Handle Count (26)
Number of Mutexes (25)
Errors Access Permissions (24)
Files Opened Total (23)
TCP Connections Passive (23)
Notepad Processor Time (21) 73 measurements
occur gt 10

18
Confusion Matrix Detection Rates(3 of 10
Subjects Shown)
INTRUDER
A B C
100
100
A B C
25
100
OWNER
94
91
19
Some Questions

What if user behavior changes?(Called concept
drift in machine learning)
One solution
Assign half life to counts used to compute
probs
Multiply counts by f lt 1 each day (10/20 vs.
1000/2000)
CPU and memory demands too large?
Measuring features and updating counts lt 1 CPU
Tuning of parameters needs to be done off-line
How often to check for intrusions?
Only check when window full, then clear window
Else too many false alarms

20
Future Directions

Measure system while applying variousknown
intrusion techniques
Compare to measurements during normal operation
Train on known methods 1, , N -1
Test using data from known method N
Analyze simultaneous measurements from network
of computers
Analyze impact of intruders behavior changing
recorded statistics
Current results prob of detecting intruder in
first W sec

21
Some Related Work on Anomaly Detection

Machine learning for intrusion detection
Lane Brodley (1998)
Gosh et al. (1999)
Lee et al. (1999)
Warrender et al. (1999)
Agarwal Joshi (2001)
Typically Unix-based
Streams of programs invoked or network traffic
analyzed
Analysis of keystroke dynamics
Monrose Rubin (1997)
For authenticating passwords

22
Conclusions

Can accurately characterize individual user
behavior using simple modelsbased on measuring
many system properties
Such profiles can provide protection without
too many false alarms
Separate data into train, tune, and test sets
Let the data decide good parameter settings,on
per-user basis (including measurements to use)

23
Acknowledgements

DARPAs Insider Threat Active Profiling (ITAP)
program within ATIAS program
Mike Fahland for help with data collection
Shavlik, Inc employees who allowed collection of
their usage data

24
Using Relative Probabilities
Alarm Prob( keystrokes machine owner )
Prob( keystrokes population )
25
Value of Relative Probabilities

Using relative probabilities
Separates rare for this userfrom rare for
everyone
Example of variance reduction
Reduce variance in a measurement by comparing to
another (eg, paired t-tests)

26
Tradeoff between False Alarmsand Detected
Intrusions (ROC Curve)
spec
Note left-most value results from ZERO tune-set
false alarms
27
Conclusions

Can accurately characterize individual user
behavior using simple modelsbased on measuring
many system properties
Such profiles can provide protection without
too many false alarms
Separate data into train, tune, and test sets
Let the data decide good parameter settings,on
per-user basis (including measurements to use)
Normalize probs by general-population probs
Separate rare for this user (or server) from
rare for everyone

Learning to Detect Computer Intrusions with Extremely Few False Alarms PowerPoint PPT Presentation