Learning to Detect Computer Intrusions with Extremely Few False Alarms PowerPoint PPT Presentation

presentation player overlay
1 / 28
About This Presentation
Transcript and Presenter's Notes

Title: Learning to Detect Computer Intrusions with Extremely Few False Alarms


1
Learning to Detect Computer Intrusions with
(Extremely) Few False Alarms
  • Jude Shavlik

Mark Shavlik
2
Two Basic Approaches for Intrusion Detection
Systems (IDS)
  • Pattern Matching
  • If packet contains site exec and then sound
    alarm
  • Famous example SNORT.org
  • Weakness Dont (yet) have patterns for new
    attacks
  • Anomaly Detection ?
  • Usually based on statistics measured during
    normal behavior
  • Weakness does anomaly intrusion ?
  • Both approaches often suffer from too many false
    alarms
  • Admins ignore IDS when flooded by false alarms

3
How to Get Training Examplesfor Machine Learning?
  • Ideally, get measurements during
  • Normal Operation vs.
  • Intrusions
  • However, hard to define space of possible
    intrusions
  • Instead, learn from positive examples only
  • Learn whats normal and define all else as
    anomalous

4
Behavior-Based Intrusion Detection
  • Need to go beyond looking solely at external
    network traffic and log files
  • File-access patterns
  • Typing behavior
  • Choice of programs run
  • Like human immune system, continually monitor and
    notice foreign behavior

5
Our General Approach
  • Identify unique characteristics of each
    user/servers behavior
  • Every second, measure 100s of Windows 2000
    properties
  • in/out network traffic, programs running,keys
    pressed, kernel usage, etc
  • Predict Prob( normal measurements )
  • Raise alarm if recent measurementsseem unlikely
    for this user/server

6
Goal Choose Feature Space that Widely
Separates User from General Population
Choose separate set of features for each user
Specific User
General Population
Possible Measurements in Chosen Space
7
What Were Measuring (in Windows 2000)
  • Performance Monitor (Perfmon) data
  • File bytes written per second
  • TCP/IP/UDP/ICMP segments sent per second
  • System calls per second
  • of processes, threads, events,
  • Event-Log entries
  • Programs running, CPU usage, working-set size
  • MS Office, Wordpad, Notepad
  • Browsers IE, Netscape
  • Program development tools,
  • Keystroke and mouse events

8
Temporal Aggregates
  • Actual Value Measured
  • Average of the Previous 10 Values
  • Average of the Previous 100 Values
  • Difference between Current Value and Previous
    Value
  • Difference between Current Value and Average of
    Last 10
  • Difference between Current Value and Ave of Last
    100
  • Difference between Averages of Prev 10 and Prev
    100

9
Using (Naïve) Bayesian Networks
  • Learning network structure too CPU-intensive
  • Plus, naïve Bayes frequently works best
  • Testset results
  • 59.2 of intrusions detected
  • About 2 false alarms per day per user
  • This papers approach
  • 93.6 detected
  • 0.3 false alarms per day per user

10
Our Intrusion-Detection Template
Last W (window width) measurements
...
X
X
X
time (in sec)
If score(current measurements) gt T then raise
mini alarm If mini alarms in window gt N
then predict intrusion
Use tuning set to choose per user good values for
T and N
11
Methodology for Training and Evaluating
Learned Models
. . .
Alarm from Model of User X ?
yes
Alarm from Model of User Y ?
False Alarm
yes
Intrusion Detected
12
Learning to Score Windows 2000 Measurements (done
for each user)
  • Initialize weights on each feature to 1
  • For each training example do
  • Set weightedVotesFOR 0 weightedVotesAGAINST
    0
  • If measurement i is unlikely (ie, low prob)
    then add weighti to weightedVotesFOR else
    add weighti to weightedVotesAGAINST
  • If weightedVotesFOR gt weightedVotesAGAINST
    then raise mini alarm
  • If decision about intrusion incorrect,
    multiply weights by ½ on all measurements
    that voted incorrectly (Winnow algorithm)

13
Choosing Good Parameter Values
  • For each user
  • Use training data to estimate probabilities and
    weight individual measurements
  • Try 20 values for T and 20 values for N
  • For each T x N pairing
  • compute detection and false-alarm rates
  • on tuning set
  • Select T x N pairing whose
  • false-alarm rate is less than spec (e.g., 1 per
    day)
  • has highest detection rate

14
Experimental Data
  • Subjects
  • Insiders 10 employees at Shavlik Technologies
  • Outsiders 6 additional Shavlik employees
  • Unobtrusively collected data for 6 weeks
  • 7 GBytes archived
  • Task Are current measurements from user X?

15
Training, Tuning, and Testing Sets
  • Very important in machine learning to not use
    testing data to optimize parameters!
  • Can tune to zero false alarms and high detection
    rates!
  • Train Set first two weeks of data
  • Build a (statistical) model
  • Tune Set middle two weeks of data
  • Choose good parameter settings
  • Test Set last two weeks of data
  • Evaluate frozen model

16
Experimental Results on the Testset
17
Highly Weighted Measurements( of time in Top
Ten across users experiments)
  • Number of Semaphores (43)
  • Logon Total (43)
  • Print Jobs (41)
  • System Driver Total Bytes (39)
  • CMD Handle Count (35)
  • Excel Handle Count (26)
  • Number of Mutexes (25)
  • Errors Access Permissions (24)
  • Files Opened Total (23)
  • TCP Connections Passive (23)
  • Notepad Processor Time (21) 73 measurements
    occur gt 10

18
Confusion Matrix Detection Rates(3 of 10
Subjects Shown)
INTRUDER
A B C
100
100
A B C
25
100
OWNER
94
91
19
Some Questions
  • What if user behavior changes?(Called concept
    drift in machine learning)
  • One solution
  • Assign half life to counts used to compute
    probs
  • Multiply counts by f lt 1 each day (10/20 vs.
    1000/2000)
  • CPU and memory demands too large?
  • Measuring features and updating counts lt 1 CPU
  • Tuning of parameters needs to be done off-line
  • How often to check for intrusions?
  • Only check when window full, then clear window
  • Else too many false alarms

20
Future Directions
  • Measure system while applying variousknown
    intrusion techniques
  • Compare to measurements during normal operation
  • Train on known methods 1, , N -1
  • Test using data from known method N
  • Analyze simultaneous measurements from network
    of computers
  • Analyze impact of intruders behavior changing
    recorded statistics
  • Current results prob of detecting intruder in
    first W sec

21
Some Related Work on Anomaly Detection
  • Machine learning for intrusion detection
  • Lane Brodley (1998)
  • Gosh et al. (1999)
  • Lee et al. (1999)
  • Warrender et al. (1999)
  • Agarwal Joshi (2001)
  • Typically Unix-based
  • Streams of programs invoked or network traffic
    analyzed
  • Analysis of keystroke dynamics
  • Monrose Rubin (1997)
  • For authenticating passwords

22
Conclusions
  • Can accurately characterize individual user
    behavior using simple modelsbased on measuring
    many system properties
  • Such profiles can provide protection without
    too many false alarms
  • Separate data into train, tune, and test sets
  • Let the data decide good parameter settings,on
    per-user basis (including measurements to use)

23
Acknowledgements
  • DARPAs Insider Threat Active Profiling (ITAP)
    program within ATIAS program
  • Mike Fahland for help with data collection
  • Shavlik, Inc employees who allowed collection of
    their usage data

24
Using Relative Probabilities
Alarm Prob( keystrokes machine owner )
Prob( keystrokes population )
25
Value of Relative Probabilities
  • Using relative probabilities
  • Separates rare for this userfrom rare for
    everyone
  • Example of variance reduction
  • Reduce variance in a measurement by comparing to
    another (eg, paired t-tests)

26
Tradeoff between False Alarmsand Detected
Intrusions (ROC Curve)
spec
Note left-most value results from ZERO tune-set
false alarms
27
Conclusions
  • Can accurately characterize individual user
    behavior using simple modelsbased on measuring
    many system properties
  • Such profiles can provide protection without
    too many false alarms
  • Separate data into train, tune, and test sets
  • Let the data decide good parameter settings,on
    per-user basis (including measurements to use)
  • Normalize probs by general-population probs
  • Separate rare for this user (or server) from
    rare for everyone

28
Outline
  • Approaches for BuildingIntrusion-Detection
    Systems
  • A Bit More on What We Measure
  • Experiments with Windows 2000 Data
  • Wrapup
Write a Comment
User Comments (0)
About PowerShow.com