Learning Program Behavior for Intrusion Detection - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Learning Program Behavior for Intrusion Detection

Description:

... applications: access control, malicious code, misuse and anomaly detection, etc. ... system call sequences from source code, and check for compliance at runtime. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: Yih2
Category:

less

Transcript and Presenter's Notes

Title: Learning Program Behavior for Intrusion Detection


1
Learning Program Behavior for Intrusion Detection
  • Yihua Liao
  • Advisor Prof. Rao Vemuri
  • Feb. 20, 2002

2
Outline
  • Machine learning and Computer Security
  • Related work on modeling program behavior
  • K-Nearest Neighbor classifier and text
    categorization
  • Experiments with DARPA BSM data
  • Neural Networks
  • Conclusions

3
Machine Learning Security
  • To improve automatically with experience
  • Good at learning user, system or network
    behavior, extracting usage patterns and rules,
    and classifying new instances.
  • Security related applications access control,
    malicious code, misuse and anomaly detection, etc.

4
Model user program behavior
  • User behavior
  • Insider threats
  • UNIX shell commands, login events, etc.
  • Concept drift, privacy issues
  • Program behavior
  • Intrusions often occur when program misused.
  • Program profiles provide concise and stable
    tracks

5
Modeling program behavior
  • Program policy specification (Ko et al. 1994)
  • - Determine intended behavior write security
    specifications for monitored progs.
  • Static Analysis (Wagner et al. 2001)
  • - Use NDPDA to find possible system call
    sequences from source code, and check for
    compliance at runtime.
  • - Challenges Dynamic linking, threads, large
    overhead

6
Modeling program behavior (cont.)
  • Learn program behavior profiles from previous
    executions. (Forrest, Lee, Ghosh, etc.)
  • Short sequences of system calls
  • Profiles for individual programs
  • Time-consuming training and testing process

7
Short sequences of system calls
  • open read mmap mmap open close
  • Unique sequences for window size 3
  • open read mmap
  • read mmap mmap
  • mmap mmap open
  • mmap open close

8
Analogy
  • word ?? system call
  • text document ?? list of system calls issued by a
    program
  • different categories ?? normal/intrusive

9
Text categorization
  • Transform each document into a vector
  • open read mmap mmap open close
  • Open 2 read 1 mmap 2 close 1
  • Frequency of word i in document k fik
  • Word-by-document matrix A (aik)

10
Weighting techniques
  • - Frequency weighting
  • aik fik
  • Term frequency inverse document frequency
    weighting (tif-idf)
  • Ni number of documents for which the word
    occurs at least once.

11
K-Nearest Neighbor classifier
  • Use the class labels of k most similar neighbors
    to predict the class of new document.
  • A cutoff threshold is needed.

12
Advantages
  • Limited system-call vocabulary. No dimension
    reduction techniques needed.
  • Simple binary categorization problem
  • kNN doesnt rely on prior knowledge,
    computationally efficient.

13
Experiments
  • Data set 1998 DARPA BSM data
  • Provides a large sample of network-based attacks
    embedded in normal background traffic.
  • TCPDUMP and BSM audit data collected on a
    simulated network.

14
(No Transcript)
15
BSM events
  • header,118,2,open(2) - read,,Mon Jun 01 081217
    1998, 925767180 msec
  • path,/usr/lib/libdl.so.1
  • attribute,100755,bin,bin,8388614,96882,0
  • subject,2104,root,100,2104,100,501,431,24 1
    135.8.60.182
  • return,success,4
  • trailer,118

16
More on data set
  • DARPA data was labeled with session s. 400500
    sessions per day.
  • Individual session can be extracted from logs.
  • each session consists of 1 or more processes
  • generate list of system calls for every
    process 50 distinct system calls

17
Process id 994 close execve open mmap open
mmap mmap munmap mmap mmap close open mmap
close open mmap mmap munmap mmap close close
munmap open ioctl access chown ioctl access
chmod close close close close close exit
18
(No Transcript)
19
Training testing data
  • Training data
  • 606 distinct process vectors from 4 simulation
    days. 50 distinct system calls.
  • Testing data
  • - 35 attack sessions, including U-2-R, R-2-U,
    probe, and several intrusion scenarios.
  • - 5285 normal processes from one simulation
    day.

20
Result
21
Result
22
Result
23
Remarks on kNN
  • Suitable for dynamic enviroment
  • Instance-based learning, lazy learning
    computation at query time
  • All attributes used
  • Efficient memory indexing

24
Neural networks
  • Feed-forward multi-layer network with
    backpropagation algorithm.
  • Randomly generated data to train network. All
    data are anomalous by default.
  • Normal data cause network to recognize a
    particular area of input space as normal.

25
Summary
  • Frequencies of system calls are used to
    characterize program behavior.
  • K-Nearest Neighbor classifier can effectively
    detect intrusive program behavior. No individual
    program profiles are needed. Low false positive
    rate can be achieved.
  • KNN is suitable for dynamic environment and
    real-time ID.

26
Reference
  • http//wwwcsif.cs.ucdavis.edu/liaoy/knn_ss02.pdf
  • (old data analysis)
Write a Comment
User Comments (0)
About PowerShow.com