Anomaly Detection Based on a Rough Set Approach - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Anomaly Detection Based on a Rough Set Approach

Description:

Let , , the degree of relative dependency of Q on D over U ... Entropy E(q) 16 March 22, 200616. Secure IT 2006. 16. Algorithm Design. R C, Q empty ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 31
Provided by: secure
Category:

less

Transcript and Presenter's Notes

Title: Anomaly Detection Based on a Rough Set Approach


1
Anomaly Detection Based on a Rough Set Approach
  • Jianchao Han
  • Department of Computer Science
  • California State University, Dominguez Hills

2
Agenda
  • Introduction
  • Rough Set Theory
  • Feature Selection
  • Relative Attribute Dependency
  • The First Illustrative Example
  • Generalized Rough Set Model
  • The Second Illustrative Example
  • Conclusion
  • Future Work

3
Introduction
  • Two Complement Network Security Techniques
  • Protection
  • Purpose guard hardware, software, and user data
    against threats from both outsiders and malicious
    insiders
  • Approaches authentication, filtering, and
    encryption, fire wall
  • Intrusion Detection
  • Purpose collect information from network, host,
    file systems and then analyze the information to
    find suspicious activity
  • Approaches misuse detection, and anomaly
    detection

4
Intrusion Detection
  • Misuse or signature detection
  • Principle
  • Large databases of attack signatures needed
  • Compare the collected information to the
    databases
  • Find patterns signaling well-known attacks
  • Disadvantages
  • not being capable of detecting new attacks
  • Anomaly detection
  • Explore issues with deviations from normal system
  • Everything that does not match the stored normal
    behavior profile is considered to be a suspicious
    action
  • Disadvantages
  • not being capable of discerning intent
  • only signal unusual event, generating false
    alarms

5
Intrusion Detection System (IDS)
  • Host-based IDS
  • Use information derived from a single host
  • Network-based IDS
  • Exploit information obtained from a whole segment
    of a local network
  • Intrusion detection system
  • Must be capable of distinguishing between normal
    (not security-critical) and abnormal user
    activities
  • Discover malicious attempts in time
  • However translating user behaviors in a
    consistent security-related decision is often not
    simple, because many behavior patterns are
    unpredictable and unclear

6
Data Mining Techniques
  • Data Mining Techniques have been extensively
    applied in IDSs
  • Misuse detection systems use
  • Rule-based expert systems
  • Model-based reasoning systems
  • Genetic algorithms
  • Association rules
  • Fuzzy logic
  • Anomaly detection systems use
  • Statistical analysis
  • Sequence analysis
  • Neural networks
  • Decision trees
  • Artificial immune system

7
Data Reduction
  • Data Reduction
  • Horizontal Reduction Sampling
  • Vertical Reduction Feature Selection
  • Feature Selection
  • Statistical Feature Selection
  • Significant Feature Selection
  • Rough Set Feature Selection
  • Search Process
  • Top-bottom search
  • Bottom up search
  • Exhaust search
  • Heuristic search

8
Rough Set Based Feature Selection
  • Bottom-up search
  • Brute-force search
  • Heuristic search
  • Rough Set Theory
  • Introduced by Pawlak in the 1980s
  • An efficient tool for data mining, concept
    generation, induction, and classification

9
Rough Set Theory -- IS
  • Information system (IS)
  • IS lt U, C, D, Vaa?CUD, fgt,
  • where Uu1,u2,...,un is a non-empty set of
    tuples, called data table
  • C is a non-empty set of condition attributes
  • D is a non-empty set of decision attributes
  • C?DØ
  • Va is the domain of attribute a with at least two
    elements
  • f is a function U(C?D)?V?a?C?DVa,

10
Approximation
  • Let A?C?D, and ti, tj?U, X be a subset of U and
    A?C?D
  • Define RAltti,tjgt?UU ?a?A, tiatja
  • Indiscernibility relation, denoted IND, is an
    equivalent relation on U , RA is an equivalent
    relation on U.
  • Approximation space (U, IND) partitions U into
    equivalent classes AA1,A2,,Am induced by RA
  • Lower approximation or positive region of X
    LowA(X)?Ai?A Ai?X, 1im
  • Upper approximation of X based on A UppA(X)
    ?Ai?AAi?X?Ø, 1im
  • Boundary area of X BoundaryA(X)
    UppA(X)-LowA(X)
  • Negative region of X NegA(X)?Ai ?AAi? U-X,
    1im

11
Core Attributes and Reduct
  • Let DD1,D2,,Dk be the set of elementary
    sets partitioned by RD
  • Approximation aggregation
  • LowA(D)?kj1LowA(Dj)
  • UppA(D)? kj1UppA(Dj)
  • a ?C is a Core attribute of C, if LowC(D) ?
    LowC-a (D), dispensable attribute otherwise
  • R?C is a reduct of C in U w.r.t. D if
    LowR(D)LowC(D) and
  • ?B?R, LowB(D)?LowC(D)

12
Calculation of Reducts
  • Finding all reducts is NP-hard
  • Traditional method decision matrix
  • Some new methods still suffer from intensive
    computation of
  • either discernibility functions
  • or positive regions
  • Our method
  • New equivalent definition of reducts
  • Count distinct tuples (rows) in the IS table
  • Efficient algorithm for finding reducts

13
Relative Attribute Dependency
  • Let , --
    projection of U on P
  • Let , , the degree of
    relative dependency of Q on D over U
  • is the number of equivalence
    classes in U/IND(X).
  • Theorem. Assume U is consistent. is
    a reduct of C with respect to D if and only if
  • 1)
  • 2)

14
Computation Model (RAD)
  • Input A decision table U, condition attributes
    set C and decision attributes set D
  • Output A minimum reduct R of condition
    attributes set C with respect to D in U
  • Computation Find a subset R of C,
    such that , and
    .

15
A Heuristic Algorithm for Finding Optimal Reducts
  • Given the partition by D, U/IND(D), of U, the
    entropy, or expected information based on the
    partition by , of U, is given
    by where
  • Entropy E(q)

16
Algorithm Design
  • R ? C, Q ? empty
  • For each attribute do
  • Compute the entropy E(q) of q
  • Q ? Q ?ltq, E(q)gt
  • While Do
  • q?
  • Q ? Q ltq, E(q)gt
  • If Then
  • R ? R q
  • Return R

Algorithm complexity
17
An Illustrative Example
  • Protocol -- tcp, udp, icmp, etc.
  • Service -- http, telnet, ftp, etc.
  • Flag -- if a connection is normal or an error
  • Land -- if connection is from/to the same
    host/port
  • Label -- the service request is normal or
    anomaly.

18
Elementary Sets
  • U/IND(D)U/Labelt1, t3, t6, t2, t4, t5, t7,
    t8
  • U/Protocolt1, t2, t6, t8, t4, t5, t7, t3
  • U/Servicet1, t4, t8, t2, t3, t5, t6, t7
  • U/Flagt1, t3, t4, t5, t6, t2, t7, t8
  • U/Landt1, t3, t5, t6, t2, t4, t7, t8

19
Information Entropy
  • Attribute test order Service, Flag, Protocol,
    Land

20
Algorithm Computation
  • Attribute test order Service, Flag, Protocol,
    Land
  • Algorithm computing
  • Test Service
  • ?R-Service 6/6 1 ? Service can be removed
  • Test Flag
  • ?R-Service, Flag 5/5 1 ? Flag can be
    removed
  • Test Protocol
  • ?R-Service, Flag, Protocol 2/3 lt 1 ? Protocol
    cannot be removed
  • Test Land
  • ?R-Service, Flag, Land 3/4 lt 1 ? Land cannot
    be removed
  • Algorithm Result Reduct Protocol, Land

21
Limitations of Rough Set Theory
  • All tuples are treated with equal importance
  • In real-life application, some typical examples
    may be more important than others
  • All training examples must be crisply classified
  • Some examples may not be confidently assigned to
    either class
  • The lower and upper approximations of a concept
    are defined based on the strict set inclusion
    operation
  • Has no tolerance to the noise data in the
    classification

22
Generalized Rough Set Model
  • The proposed Generalized Rough Set (GRS) Model is
    defined as an uncertain information system
  • Uncertain Information system (UIS)
  • UIS lt U, C, D, Vaa?CUD, f, g, dgt,
  • where U, C, D, Va, and f are the same as those in
    IS
  • g is a function U?0,1, indicates the certainty
    of being positive example
  • d is a function U?0,1, assigns each tuple an
    importance factor to represent the importance of
    the tuple for the classification task
  • d?g contributes to the positive class, while
    d?(1-g) contributes to the negative class

23
Deal With Noise Data
  • Put some boundary examples in either positive
    region or negative region
  • Two classification factors P? and N? between 0
    and 1 are introduced to solve this problem
  • Let E be a non-empty equivalent class. The
    classification ratios of E with respect to the
    positive class Pclass and negative class Nclass
    are defined as
  • CP(E) is the certainty to classify E to the
    positive region, while CN(E) is the certainty to
    classify E to the negative region

24
Classification with GRS
  • For the pre-specified precision threshold P? and
    N?,
  • E is classified to the positive class if CP(E)?P?
  • E is classified to the negative class if CN(E)?N?
  • Otherwise, E is put to the boundary region
  • Classification error rate
  • If tuples in E are classified to positive class,
    the classification error rate is 1-CP(E)
  • If tuples in E are classified to negative class,
    the classification error rate is 1-CN(E).

25
Approximation in GRS
  • Let RP,N be the indiscernibility relation based
    on a set of condition attributes B and RP,NE1,
    E2, ,En be the collection of equivalent
    classes of RP,N.
  • Assume X?U, then the positive lower
    approximation, upper approximation and boundary
    region of X with respect to the precision P? and
    N?, are defined as
  • The condition dependency degree is defined as

26
Finding Detection Rules With GRS
  • Input UISltU,C,D,Vaa?C,f,g,dgt and
    classification precision factors P? and N?
  • Output A set of classification rules with format
  • IF ltconditiongt
  • THEN ltconclusiongt ltCertainty-Factorgt
  • Procedure outline
  • Compute the dependency degree of the decision
    attribute D on the condition attribute set C
  • Find the generalized attribute reducts of the
    condition attribute set C according to the
    attribute dependency
  • Construct classification rules with certainty
    factors.

27
An Example
  • The data table
  • P?0.85, N?0.80
  • Classification ratios using element sets
  • Dependency ?(C, D, 0.85, 0.80) 0.64.

28
An Example (Cont.)
  • Dropping C1, CC2, C3
  • Classification ratios using C
  • Dependency ?(C, D, 0.85, 0.80) 0.64
  • Since ?(C, D, 0.85, 0.80) ?(C, D, 0.85, 0.80),
    C is a reduct of C

29
Classification Rules
  • Using the reduct, positive rules are
  • Rule 1 If C20 then D1 (normal) with CF0.85.
  • Rule 2 If C21 then D1 (normal) with CF0.47.
  • Rule 3 If C22 then D1 (normal) with CF0.10.
  • Negative rules are
  • Rule 4 If C20 then D0 (abnormal) with CF0.05.
  • Rule 5 If C21 then D0 (abnormal) with CF0.33.
  • Rule 6 If C22 then D0 (abnormal) with CF0.85.

30
Conclusion and Future Work
  • Conclusion
  • Use rough set theory to design intrusion
    detection system has been paid much attention
  • Information collected for IDS contains noise data
  • Fast algorithm to select significant features
  • GRS model can deal with some noise data
  • Future work
  • Refine the model
  • Improve the algorithm
  • Apply to our future network security project
Write a Comment
User Comments (0)
About PowerShow.com