An%20Enhanced%20Support%20Vector%20Machine%20Model%20for%20Intrusion%20Detection - PowerPoint PPT Presentation

About This Presentation
Title:

An%20Enhanced%20Support%20Vector%20Machine%20Model%20for%20Intrusion%20Detection

Description:

Intrusion detection: the art of detecting inappropriate, incorrect, or anomalous ... Support Vector Machines. A machine learning method based on statistical ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 26
Provided by: stude785
Category:

less

Transcript and Presenter's Notes

Title: An%20Enhanced%20Support%20Vector%20Machine%20Model%20for%20Intrusion%20Detection


1
An Enhanced Support Vector Machine Model for
Intrusion Detection
  • J. T. Yao, S. L. Zhao, L. Fan
  • Department of Computer Science
  • University of Regina
  • jtyao_at_cs.uregina.ca

2
Intrusion Detection Systems
  • Intrusion detection the art of detecting
    inappropriate, incorrect, or anomalous activity.
  • Intrusion detection systems
  • A set of processes, procedures, tools, software,
    hardware and databases having intrusion detection
    technologies that fit together.
  • Misuse attack originates from the internal
    network
  • Intrusion attacks from the outside

3
IDS Functional Components
  • Information Source
  • Provides a stream of event records.
  • Analysis Engine
  • Analyzes event records and detects intrusion.
  • Decision Maker
  • Decides the reactions for intrusions.

4
Detection Methods
  • Misuse detection
  • Detecting intrusion by known intrusion
    signatures.
  • Anomaly detection
  • Mining normal event patterns from event records,
    then use these patterns to classify normal and
    intrusion events.

5
Candidate AI Techniques
  • Expert Systems
  • Hidden Markov Model
  • Fuzzy logic
  • Classification
  • Support Vector Machines (SVM)

6
Support Vector Machines
  • A machine learning method based on statistical
    learning theories.
  • Classifies data by a set of support vectors that
    represent data patterns.
  • Finds a discriminant function that classify new
    data.

7
Benefits of Using SVM
  • Good generalization ability
  • Capability of handling a large number of features

8
Problem of SVM on IDS
  • All features are treated equally
  • Noise features (some feature cause noise during
    classification)
  • Redundant features
  • High feature numbers affect performance
  • Training process
  • Detection process

9
Thoughts of Solution
  • Reducing feature number while keep the useful
    information
  • Calculating the importance of features
  • Treating features differently based on their
    importance

10
An Enhancing SVM Model
  • Using Rough Set to calculate reducts
  • Calculate feature weights from reducts
  • Remove redundant features based on weights
  • Apply weights to SVM kernel

11
Calculate Weights from Reducts
  • The principles of calculation are
  • If a feature is not in any reducts, its weight0.
  • More times a feature appears in reducts, more
    important the feature is.
  • The fewer the number of features in a reduct, the
    more important these feature are.

12
Apply Weights to Kernel Function
  • The training result of SVM is
  • where is the number of training records,
    is the
    Lagrange multipliers, is the label
    associated with the training data,
    is a constant,


    is the kernel
    function and is called a set
    of Support Vectors, is a bias term. Weight w
    is a diagonal matrix

13
Weights Independent to Kernel Functions
  • Could apply weights to any known kernel functions
  • Restrict Wgt0 to make sure enhanced kernel
    function satisfies Mercers Condition

14
Experiment Procedures
15
Experiment Data Set
  • KDD (Knowledge Discovery in Databases) Cup 1999
    data set.
  • Feature-value format.
  • 41 features for each record.
  • Original data set contains 744 MB data with
    4,940,000 connection records.

16
Experiment Data Set 2
  • UNM (University of New Mexico) data set.
  • Sequence-based.
  • Generate a trace each time a user access a
    certain UNIX process.

17
KDD Training results of conventional SVM with
different value of gamma (table 1)
Training Result Exp1 Exp2 Exp3
Training record 50,000 50,000 50,000
Feature 41 41 41
Kernel type RBF RBF RBF
Value of
Generated SV 6,948 1,868 1,057
18
KDD Test results of conventional SVM with
different values of gamma (table 2)
Test Result Exp1 Exp2 Exp3
Test record 10,000 10,000 10,000
Feature 41 41 41
Value of
of misclassified 44 63 211
Accuracy 99.56 99.37 97.89
False Positive 37 52 176
False Negative 7 11 35
CPU seconds 49.53 11.34 8.32
19
Comparisons of the experimental results on the
KDD dataset (table 3)
Test Result CPU
Test set 1 Test set 1 Test set 1 Test set 1 Test set 1 Test set 1
Conventional SVM 10,000 41 99.82 7.69 222.28
Enhanced SVM 10,000 16 99.86 6.39 77.63
Improvement 60.0 0.4 16.9 66.0
Test set 2 Test set 2 Test set 2 Test set 2 Test set 2 Test set 2
Conventional SVM 10,000 41 99.80 8.25 227.03
Enhanced SVM 10,000 16 99.85 6.91 78.93
Improvement 60.0 0.5 16.2 65.0
Test set 3 Test set 3 Test set 3 Test set 3 Test set 3 Test set 3
Conventional SVM 10,000 41 99.88 7.45 230.27
Enhanced SVM 10,000 16 99.91 5.49 77.85
Improvement 60 0.3 26.3 66.0
20
Comparisons of the experimental results on the
UNM lpr dataset (table 4)
Test Result CPU
Test set 1 Test set 1 Test set 1 Test set 1 Test set 1 Test set 1
Conventional SVM 2,000 467 100 0 1.62
Enhanced SVM 2,000 9 100 0 0.28
Improvement 98 83
Test set 2 Test set 2 Test set 2 Test set 2 Test set 2 Test set 2
Conventional SVM 2,000 467 100 0 1.71
Enhanced SVM 2,000 9 100 0 0.29
Improvement 98 83
Test set 3 Test set 3 Test set 3 Test set 3 Test set 3 Test set 3
Conventional SVM 2,000 467 100 0 1.59
Enhanced SVM 2,000 9 100 0 0.25
Improvement 98 84
21
Experiment Results
  • Larger value of results a larger number of
    Support Vectors generated.
  • Larger number of SVs results in higher detection
    accuracy and higher computation costs.
  • Improvement of enhanced SVM is consistent for all
    the six test sets

22
Experiment Results 2
  • Enhanced SVM outperforms the conventional SVM in
    precision, false negative rate and CPU time for
    KDD dataset.
  • Enhanced SVM is 80 faster for lpr dataset.

23
Experiment Results
  • Although generated from a small training set, the
    decision boundary is consistent for whole data
    set
  • The test results show little difference between
    small and full size of training set, which prove
    the good generalization ability of SVM.

24
Conclusion
  • An enhanced SVM model is introduced.
  • Features are reduced and weighted.
  • It has good generalization ability.
  • It has better performance in two experiments.

25
An Enhanced Support Vector Machine Model for
Intrusion Detection
  • J. T. Yao, S. L. Zhao, L. Fan
  • Department of Computer Science
  • University of Regina
  • jtyao_at_cs.uregina.ca
Write a Comment
User Comments (0)
About PowerShow.com