Learning Program Behavior for Intrusion Detection - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Learning Program Behavior for Intrusion Detection

Description:

... applications: access control, malicious code, misuse and anomaly detection, etc. ... system call sequences from source code, and check for compliance at runtime. ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 27

Provided by: Yih2

Category:

more less

Transcript and Presenter's Notes

Title: Learning Program Behavior for Intrusion Detection

1
Learning Program Behavior for Intrusion Detection

Yihua Liao
Advisor Prof. Rao Vemuri
Feb. 20, 2002

2
Outline

Machine learning and Computer Security
Related work on modeling program behavior
K-Nearest Neighbor classifier and text
categorization
Experiments with DARPA BSM data
Neural Networks
Conclusions

3
Machine Learning Security

To improve automatically with experience
Good at learning user, system or network
behavior, extracting usage patterns and rules,
and classifying new instances.
Security related applications access control,
malicious code, misuse and anomaly detection, etc.

4
Model user program behavior

User behavior
Insider threats
UNIX shell commands, login events, etc.
Concept drift, privacy issues
Program behavior
Intrusions often occur when program misused.
Program profiles provide concise and stable
tracks

5
Modeling program behavior

Program policy specification (Ko et al. 1994)
- Determine intended behavior write security
specifications for monitored progs.
Static Analysis (Wagner et al. 2001)
- Use NDPDA to find possible system call
sequences from source code, and check for
compliance at runtime.
- Challenges Dynamic linking, threads, large
overhead

6
Modeling program behavior (cont.)

Learn program behavior profiles from previous
executions. (Forrest, Lee, Ghosh, etc.)
Short sequences of system calls
Profiles for individual programs
Time-consuming training and testing process

7
Short sequences of system calls

open read mmap mmap open close
Unique sequences for window size 3
open read mmap
read mmap mmap
mmap mmap open
mmap open close

8
Analogy

word ?? system call
text document ?? list of system calls issued by a
program
different categories ?? normal/intrusive

9
Text categorization

Transform each document into a vector
open read mmap mmap open close
Open 2 read 1 mmap 2 close 1
Frequency of word i in document k fik
Word-by-document matrix A (aik)

10
Weighting techniques

- Frequency weighting
aik fik
Term frequency inverse document frequency
weighting (tif-idf)
Ni number of documents for which the word
occurs at least once.

11
K-Nearest Neighbor classifier

Use the class labels of k most similar neighbors
to predict the class of new document.
A cutoff threshold is needed.

12
Advantages

Limited system-call vocabulary. No dimension
reduction techniques needed.
Simple binary categorization problem
kNN doesnt rely on prior knowledge,
computationally efficient.

13
Experiments

Data set 1998 DARPA BSM data
Provides a large sample of network-based attacks
embedded in normal background traffic.
TCPDUMP and BSM audit data collected on a
simulated network.

14
(No Transcript)
15
BSM events

header,118,2,open(2) - read,,Mon Jun 01 081217
1998, 925767180 msec
path,/usr/lib/libdl.so.1
attribute,100755,bin,bin,8388614,96882,0
subject,2104,root,100,2104,100,501,431,24 1
135.8.60.182
return,success,4
trailer,118

16
More on data set

DARPA data was labeled with session s. 400500
sessions per day.
Individual session can be extracted from logs.
each session consists of 1 or more processes
generate list of system calls for every
process 50 distinct system calls

17
Process id 994 close execve open mmap open
mmap mmap munmap mmap mmap close open mmap
close open mmap mmap munmap mmap close close
munmap open ioctl access chown ioctl access
chmod close close close close close exit
18
(No Transcript)
19
Training testing data

Training data
606 distinct process vectors from 4 simulation
days. 50 distinct system calls.
Testing data
- 35 attack sessions, including U-2-R, R-2-U,
probe, and several intrusion scenarios.
- 5285 normal processes from one simulation
day.

20
Result
21
Result
22
Result
23
Remarks on kNN

Suitable for dynamic enviroment
Instance-based learning, lazy learning
computation at query time
All attributes used
Efficient memory indexing

24
Neural networks

Feed-forward multi-layer network with
backpropagation algorithm.
Randomly generated data to train network. All
data are anomalous by default.
Normal data cause network to recognize a
particular area of input space as normal.

25
Summary

Frequencies of system calls are used to
characterize program behavior.
K-Nearest Neighbor classifier can effectively
detect intrusive program behavior. No individual
program profiles are needed. Low false positive
rate can be achieved.
KNN is suitable for dynamic environment and
real-time ID.

26
Reference