Anomaly Detection Based on a Rough Set Approach - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Anomaly Detection Based on a Rough Set Approach

Description:

Let , , the degree of relative dependency of Q on D over U ... Entropy E(q) 16 March 22, 200616. Secure IT 2006. 16. Algorithm Design. R C, Q empty ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 31

Provided by: secure

Category:

more less

Transcript and Presenter's Notes

Title: Anomaly Detection Based on a Rough Set Approach

1
Anomaly Detection Based on a Rough Set Approach

Jianchao Han
Department of Computer Science
California State University, Dominguez Hills

2
Agenda

Introduction
Rough Set Theory
Feature Selection
Relative Attribute Dependency
The First Illustrative Example
Generalized Rough Set Model
The Second Illustrative Example
Conclusion
Future Work

3
Introduction

Two Complement Network Security Techniques
Protection
Purpose guard hardware, software, and user data
against threats from both outsiders and malicious
insiders
Approaches authentication, filtering, and
encryption, fire wall
Intrusion Detection
Purpose collect information from network, host,
file systems and then analyze the information to
find suspicious activity
Approaches misuse detection, and anomaly
detection

4
Intrusion Detection

Misuse or signature detection
Principle
Large databases of attack signatures needed
Compare the collected information to the
databases
Find patterns signaling well-known attacks
Disadvantages
not being capable of detecting new attacks
Anomaly detection
Explore issues with deviations from normal system
Everything that does not match the stored normal
behavior profile is considered to be a suspicious
action
Disadvantages
not being capable of discerning intent
only signal unusual event, generating false
alarms

5
Intrusion Detection System (IDS)

Host-based IDS
Use information derived from a single host
Network-based IDS
Exploit information obtained from a whole segment
of a local network
Intrusion detection system
Must be capable of distinguishing between normal
(not security-critical) and abnormal user
activities
Discover malicious attempts in time
However translating user behaviors in a
consistent security-related decision is often not
simple, because many behavior patterns are
unpredictable and unclear

6
Data Mining Techniques

Data Mining Techniques have been extensively
applied in IDSs
Misuse detection systems use
Rule-based expert systems
Model-based reasoning systems
Genetic algorithms
Association rules
Fuzzy logic
Anomaly detection systems use
Statistical analysis
Sequence analysis
Neural networks
Decision trees
Artificial immune system

7
Data Reduction

Data Reduction
Horizontal Reduction Sampling
Vertical Reduction Feature Selection
Feature Selection
Statistical Feature Selection
Significant Feature Selection
Rough Set Feature Selection
Search Process
Top-bottom search
Bottom up search
Exhaust search
Heuristic search

8
Rough Set Based Feature Selection

Bottom-up search
Brute-force search
Heuristic search
Rough Set Theory
Introduced by Pawlak in the 1980s
An efficient tool for data mining, concept
generation, induction, and classification

9
Rough Set Theory -- IS

Information system (IS)
IS lt U, C, D, Vaa?CUD, fgt,
where Uu1,u2,...,un is a non-empty set of
tuples, called data table
C is a non-empty set of condition attributes
D is a non-empty set of decision attributes
C?DØ
Va is the domain of attribute a with at least two
elements
f is a function U(C?D)?V?a?C?DVa,

10
Approximation

Let A?C?D, and ti, tj?U, X be a subset of U and
A?C?D
Define RAltti,tjgt?UU ?a?A, tiatja
Indiscernibility relation, denoted IND, is an
equivalent relation on U , RA is an equivalent
relation on U.
Approximation space (U, IND) partitions U into
equivalent classes AA1,A2,,Am induced by RA
Lower approximation or positive region of X
LowA(X)?Ai?A Ai?X, 1im
Upper approximation of X based on A UppA(X)
?Ai?AAi?X?Ø, 1im
Boundary area of X BoundaryA(X)
UppA(X)-LowA(X)
Negative region of X NegA(X)?Ai ?AAi? U-X,
1im

11
Core Attributes and Reduct

Let DD1,D2,,Dk be the set of elementary
sets partitioned by RD
Approximation aggregation
LowA(D)?kj1LowA(Dj)
UppA(D)? kj1UppA(Dj)
a ?C is a Core attribute of C, if LowC(D) ?
LowC-a (D), dispensable attribute otherwise
R?C is a reduct of C in U w.r.t. D if
LowR(D)LowC(D) and
?B?R, LowB(D)?LowC(D)

12
Calculation of Reducts

Finding all reducts is NP-hard
Traditional method decision matrix
Some new methods still suffer from intensive
computation of
either discernibility functions
or positive regions
Our method
New equivalent definition of reducts
Count distinct tuples (rows) in the IS table
Efficient algorithm for finding reducts

13
Relative Attribute Dependency

Let , --
projection of U on P
Let , , the degree of
relative dependency of Q on D over U
is the number of equivalence
classes in U/IND(X).
Theorem. Assume U is consistent. is
a reduct of C with respect to D if and only if
1)
2)

14
Computation Model (RAD)

Input A decision table U, condition attributes
set C and decision attributes set D
Output A minimum reduct R of condition
attributes set C with respect to D in U
Computation Find a subset R of C,
such that , and
.

15
A Heuristic Algorithm for Finding Optimal Reducts

Given the partition by D, U/IND(D), of U, the
entropy, or expected information based on the
partition by , of U, is given
by where
Entropy E(q)

16
Algorithm Design

R ? C, Q ? empty
For each attribute do
Compute the entropy E(q) of q
Q ? Q ?ltq, E(q)gt
While Do
q?
Q ? Q ltq, E(q)gt
If Then
R ? R q
Return R

Algorithm complexity
17
An Illustrative Example

Protocol -- tcp, udp, icmp, etc.
Service -- http, telnet, ftp, etc.
Flag -- if a connection is normal or an error
Land -- if connection is from/to the same
host/port
Label -- the service request is normal or
anomaly.

18
Elementary Sets

U/IND(D)U/Labelt1, t3, t6, t2, t4, t5, t7,
t8
U/Protocolt1, t2, t6, t8, t4, t5, t7, t3
U/Servicet1, t4, t8, t2, t3, t5, t6, t7
U/Flagt1, t3, t4, t5, t6, t2, t7, t8
U/Landt1, t3, t5, t6, t2, t4, t7, t8

19
Information Entropy

Attribute test order Service, Flag, Protocol,
Land

20
Algorithm Computation

Attribute test order Service, Flag, Protocol,
Land
Algorithm computing
Test Service
?R-Service 6/6 1 ? Service can be removed
Test Flag
?R-Service, Flag 5/5 1 ? Flag can be
removed
Test Protocol
?R-Service, Flag, Protocol 2/3 lt 1 ? Protocol
cannot be removed
Test Land
?R-Service, Flag, Land 3/4 lt 1 ? Land cannot
be removed
Algorithm Result Reduct Protocol, Land

21
Limitations of Rough Set Theory

All tuples are treated with equal importance
In real-life application, some typical examples
may be more important than others
All training examples must be crisply classified
Some examples may not be confidently assigned to
either class
The lower and upper approximations of a concept
are defined based on the strict set inclusion
operation
Has no tolerance to the noise data in the
classification

22
Generalized Rough Set Model

The proposed Generalized Rough Set (GRS) Model is
defined as an uncertain information system
Uncertain Information system (UIS)
UIS lt U, C, D, Vaa?CUD, f, g, dgt,
where U, C, D, Va, and f are the same as those in
IS
g is a function U?0,1, indicates the certainty
of being positive example
d is a function U?0,1, assigns each tuple an
importance factor to represent the importance of
the tuple for the classification task
d?g contributes to the positive class, while
d?(1-g) contributes to the negative class

23
Deal With Noise Data

Put some boundary examples in either positive
region or negative region
Two classification factors P? and N? between 0
and 1 are introduced to solve this problem
Let E be a non-empty equivalent class. The
classification ratios of E with respect to the
positive class Pclass and negative class Nclass
are defined as
CP(E) is the certainty to classify E to the
positive region, while CN(E) is the certainty to
classify E to the negative region

24
Classification with GRS

For the pre-specified precision threshold P? and
N?,
E is classified to the positive class if CP(E)?P?
E is classified to the negative class if CN(E)?N?
Otherwise, E is put to the boundary region
Classification error rate
If tuples in E are classified to positive class,
the classification error rate is 1-CP(E)
If tuples in E are classified to negative class,
the classification error rate is 1-CN(E).

25
Approximation in GRS

Let RP,N be the indiscernibility relation based
on a set of condition attributes B and RP,NE1,
E2, ,En be the collection of equivalent
classes of RP,N.
Assume X?U, then the positive lower
approximation, upper approximation and boundary
region of X with respect to the precision P? and
N?, are defined as
The condition dependency degree is defined as

26
Finding Detection Rules With GRS

Input UISltU,C,D,Vaa?C,f,g,dgt and
classification precision factors P? and N?
Output A set of classification rules with format
IF ltconditiongt
THEN ltconclusiongt ltCertainty-Factorgt
Procedure outline
Compute the dependency degree of the decision
attribute D on the condition attribute set C
Find the generalized attribute reducts of the
condition attribute set C according to the
attribute dependency
Construct classification rules with certainty
factors.

27
An Example

The data table
P?0.85, N?0.80
Classification ratios using element sets
Dependency ?(C, D, 0.85, 0.80) 0.64.

28
An Example (Cont.)

Dropping C1, CC2, C3
Classification ratios using C
Dependency ?(C, D, 0.85, 0.80) 0.64
Since ?(C, D, 0.85, 0.80) ?(C, D, 0.85, 0.80),
C is a reduct of C

29
Classification Rules

Using the reduct, positive rules are
Rule 1 If C20 then D1 (normal) with CF0.85.
Rule 2 If C21 then D1 (normal) with CF0.47.
Rule 3 If C22 then D1 (normal) with CF0.10.
Negative rules are
Rule 4 If C20 then D0 (abnormal) with CF0.05.
Rule 5 If C21 then D0 (abnormal) with CF0.33.
Rule 6 If C22 then D0 (abnormal) with CF0.85.

30
Conclusion and Future Work

Conclusion
Use rough set theory to design intrusion
detection system has been paid much attention
Information collected for IDS contains noise data
Fast algorithm to select significant features
GRS model can deal with some noise data
Future work
Refine the model
Improve the algorithm
Apply to our future network security project