Title: Learning to Detect Computer Intrusions with Extremely Few False Alarms
1Learning to Detect Computer Intrusions with
(Extremely) Few False Alarms
Mark Shavlik
2Two Basic Approaches for Intrusion Detection
Systems (IDS)
- Pattern Matching
- If packet contains site exec and then sound
alarm - Famous example SNORT.org
- Weakness Dont (yet) have patterns for new
attacks - Anomaly Detection ?
- Usually based on statistics measured during
normal behavior - Weakness does anomaly intrusion ?
- Both approaches often suffer from too many false
alarms - Admins ignore IDS when flooded by false alarms
3How to Get Training Examplesfor Machine Learning?
- Ideally, get measurements during
- Normal Operation vs.
- Intrusions
- However, hard to define space of possible
intrusions - Instead, learn from positive examples only
- Learn whats normal and define all else as
anomalous
4Behavior-Based Intrusion Detection
- Need to go beyond looking solely at external
network traffic and log files - File-access patterns
- Typing behavior
- Choice of programs run
-
- Like human immune system, continually monitor and
notice foreign behavior
5Our General Approach
- Identify unique characteristics of each
user/servers behavior - Every second, measure 100s of Windows 2000
properties - in/out network traffic, programs running,keys
pressed, kernel usage, etc - Predict Prob( normal measurements )
- Raise alarm if recent measurementsseem unlikely
for this user/server
6Goal Choose Feature Space that Widely
Separates User from General Population
Choose separate set of features for each user
Specific User
General Population
Possible Measurements in Chosen Space
7What Were Measuring (in Windows 2000)
- Performance Monitor (Perfmon) data
- File bytes written per second
- TCP/IP/UDP/ICMP segments sent per second
- System calls per second
- of processes, threads, events,
- Event-Log entries
- Programs running, CPU usage, working-set size
- MS Office, Wordpad, Notepad
- Browsers IE, Netscape
- Program development tools,
- Keystroke and mouse events
8Temporal Aggregates
- Actual Value Measured
- Average of the Previous 10 Values
- Average of the Previous 100 Values
- Difference between Current Value and Previous
Value - Difference between Current Value and Average of
Last 10 - Difference between Current Value and Ave of Last
100 - Difference between Averages of Prev 10 and Prev
100
9Using (Naïve) Bayesian Networks
- Learning network structure too CPU-intensive
- Plus, naïve Bayes frequently works best
- Testset results
- 59.2 of intrusions detected
- About 2 false alarms per day per user
- This papers approach
- 93.6 detected
- 0.3 false alarms per day per user
10Our Intrusion-Detection Template
Last W (window width) measurements
...
X
X
X
time (in sec)
If score(current measurements) gt T then raise
mini alarm If mini alarms in window gt N
then predict intrusion
Use tuning set to choose per user good values for
T and N
11Methodology for Training and Evaluating
Learned Models
. . .
Alarm from Model of User X ?
yes
Alarm from Model of User Y ?
False Alarm
yes
Intrusion Detected
12Learning to Score Windows 2000 Measurements (done
for each user)
- Initialize weights on each feature to 1
- For each training example do
- Set weightedVotesFOR 0 weightedVotesAGAINST
0 - If measurement i is unlikely (ie, low prob)
then add weighti to weightedVotesFOR else
add weighti to weightedVotesAGAINST - If weightedVotesFOR gt weightedVotesAGAINST
then raise mini alarm - If decision about intrusion incorrect,
multiply weights by ½ on all measurements
that voted incorrectly (Winnow algorithm)
13Choosing Good Parameter Values
- For each user
- Use training data to estimate probabilities and
weight individual measurements - Try 20 values for T and 20 values for N
- For each T x N pairing
- compute detection and false-alarm rates
- on tuning set
- Select T x N pairing whose
- false-alarm rate is less than spec (e.g., 1 per
day) - has highest detection rate
14Experimental Data
- Subjects
- Insiders 10 employees at Shavlik Technologies
- Outsiders 6 additional Shavlik employees
- Unobtrusively collected data for 6 weeks
- 7 GBytes archived
- Task Are current measurements from user X?
15Training, Tuning, and Testing Sets
- Very important in machine learning to not use
testing data to optimize parameters! - Can tune to zero false alarms and high detection
rates! - Train Set first two weeks of data
- Build a (statistical) model
- Tune Set middle two weeks of data
- Choose good parameter settings
- Test Set last two weeks of data
- Evaluate frozen model
16Experimental Results on the Testset
17Highly Weighted Measurements( of time in Top
Ten across users experiments)
- Number of Semaphores (43)
- Logon Total (43)
- Print Jobs (41)
- System Driver Total Bytes (39)
- CMD Handle Count (35)
- Excel Handle Count (26)
- Number of Mutexes (25)
- Errors Access Permissions (24)
- Files Opened Total (23)
- TCP Connections Passive (23)
- Notepad Processor Time (21) 73 measurements
occur gt 10
18Confusion Matrix Detection Rates(3 of 10
Subjects Shown)
INTRUDER
A B C
100
100
A B C
25
100
OWNER
94
91
19Some Questions
- What if user behavior changes?(Called concept
drift in machine learning) - One solution
- Assign half life to counts used to compute
probs - Multiply counts by f lt 1 each day (10/20 vs.
1000/2000) - CPU and memory demands too large?
- Measuring features and updating counts lt 1 CPU
- Tuning of parameters needs to be done off-line
- How often to check for intrusions?
- Only check when window full, then clear window
- Else too many false alarms
20Future Directions
- Measure system while applying variousknown
intrusion techniques - Compare to measurements during normal operation
- Train on known methods 1, , N -1
- Test using data from known method N
- Analyze simultaneous measurements from network
of computers - Analyze impact of intruders behavior changing
recorded statistics - Current results prob of detecting intruder in
first W sec
21Some Related Work on Anomaly Detection
- Machine learning for intrusion detection
- Lane Brodley (1998)
- Gosh et al. (1999)
- Lee et al. (1999)
- Warrender et al. (1999)
- Agarwal Joshi (2001)
- Typically Unix-based
- Streams of programs invoked or network traffic
analyzed - Analysis of keystroke dynamics
- Monrose Rubin (1997)
- For authenticating passwords
22Conclusions
- Can accurately characterize individual user
behavior using simple modelsbased on measuring
many system properties - Such profiles can provide protection without
too many false alarms - Separate data into train, tune, and test sets
- Let the data decide good parameter settings,on
per-user basis (including measurements to use)
23Acknowledgements
- DARPAs Insider Threat Active Profiling (ITAP)
program within ATIAS program - Mike Fahland for help with data collection
- Shavlik, Inc employees who allowed collection of
their usage data
24Using Relative Probabilities
Alarm Prob( keystrokes machine owner )
Prob( keystrokes population )
25Value of Relative Probabilities
- Using relative probabilities
- Separates rare for this userfrom rare for
everyone - Example of variance reduction
- Reduce variance in a measurement by comparing to
another (eg, paired t-tests)
26Tradeoff between False Alarmsand Detected
Intrusions (ROC Curve)
spec
Note left-most value results from ZERO tune-set
false alarms
27Conclusions
- Can accurately characterize individual user
behavior using simple modelsbased on measuring
many system properties - Such profiles can provide protection without
too many false alarms - Separate data into train, tune, and test sets
- Let the data decide good parameter settings,on
per-user basis (including measurements to use) - Normalize probs by general-population probs
- Separate rare for this user (or server) from
rare for everyone
28Outline
- Approaches for BuildingIntrusion-Detection
Systems - A Bit More on What We Measure
- Experiments with Windows 2000 Data
- Wrapup