Unsupervised Intrusion Detection Using Clustering Approach - PowerPoint PPT Presentation

About This Presentation
Title:

Unsupervised Intrusion Detection Using Clustering Approach

Description:

... Misuse Detection Anomaly Detection * * * Considering these difficulties, ... * The intrusions are rare with respect to normal network traffic, ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 30
Provided by: Fer132
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Intrusion Detection Using Clustering Approach


1
Unsupervised Intrusion Detection Using
ClusteringApproach
  • Muhammet Kabukçu
  • Sefa Kiliç
  • Ferhat Kutlu
  • Teoman Toraman

2
Outline
  • Introduction
  • Using Clustering for Intrusion Detection
  • Methodology
  • Overall Summary
  • Conclusion
  • References

3
Introduction
  • Intrusion detection is the process of monitoring
    the events occurring in a computer system or
    network and analyzing them for signs of possible
    incidents.
  • Incidents are violations or imminent threats of
    violation of
  • computer security policies,
  • acceptable use policies,
  • standard security practices.

4
Introduction
  • An intrusion detection system (IDS) is software
    that automates the intrusion detection process.
  • IDSs are primarily focuses on identifying
    possible incidents and detecting when an attacker
    has successfully compromised a system by
    exploiting vulnerability in the system.

5
Introduction
6
Signature-Based Detection
  • A signature is a pattern that corresponds to a
    known threat (e.g. a telnet attempt with a
    username of "root", which is a violation of an
    organization's security policy).
  • Signature-based detection is the process of
    comparing signatures against observed events to
    identify possible incidents.
  • Advantage Very effective at detecting known
    threats.
  • Disadvantage Ineffective at detecting
    previously unknown
    threats.

7
Anomaly-Based Detection
  • The process of comparing definitions of what
    activity is considered normal against observed
    events to identify significant deviations.
  • Capable of detecting previously unknown threats.
  • Uses host or network-specific profiles.

8
Detection by Stateful Protocol Analysis
  • The process of comparing predetermined profiles
    of generally accepted definitions of benign
    protocol activity for each protocol state against
    observed events to identify deviations.
  • Relies on vendor-developed universal profiles
    that specify how particular protocols should and
    should not be used.

9
Using Clustering for Intrusion Detection
  • Methods other than Signature-Based Detection use
    data mining and machine learning algorithms to
    train on labeled network data.
  • For training data, there are two major paradigms
  • Misuse Detection Anomaly Detection.

Which one to use ???
10
Using Clustering for Intrusion Detection- Misuse
Detection -
  • In misuse detection, machine learning algorithms
    are used with labeled data.
  • By using the extracted features from labeled
    network traffic, network data is classified.
  • By using new data which includes new type of
    attacks, detection models are retrained.

11
Using Clustering for Intrusion Detection-
Anomaly Detection -
  • In anomaly detection,
  • models are built by training on normal data,
  • deviations are searched over the normal model.
  • Generating purely normal data is
  • very difficult and costly in practice.
  • It is very hard to guarantee that
  • there are no attacks during the time
  • the traffic is collected from the
  • network.

12
Using Clustering for Intrusion Detection
  • Misuse Detection Anomaly Detection.
  • Use a mechanism to detect intrusions by using
    unlabeled data as a train model.
  • Find intrusions buried within that data.

13
Using Clustering for Intrusion Detection
Unsupervised Anomaly Detection Algorithm
A Set of Unlabeled Data
Detected Intrusion Clusters
  • Assumptions for unsupervised anomaly detection
    algorithm
  • The intrusions are rare with respect to normal
    network traffic.
  • The intrusions are different from normal network
    traffic.
  • As a Result
  • The intrusions will appear as outliers in the
    data.

Connection Comparison with Detected Clusters
Detected malicious attacks
14
Using Clustering for Intrusion Detection
  • The unsupervised anomaly
  • detection algorithm clusters
  • the unlabeled data instances
  • together into clusters using a
  • simple distance-based metric.

15
Using Clustering for Intrusion Detection
  • Once data is clustered, all of the
  • instances that appear in
  • small clusters are labeled as
  • anomalies because
  • The normal instances should form large clusters
    compared to the intrusions,
  • Malicious intrusions and normal instances are
    qualitatively different, so they do not fall into
    the same cluster.

Intrusion cluster
Normal cluster
16
Methodology
  • Description of the dataset
  • Metric Normalization
  • Clustering Algorithm
  • Portnoy et. al.
  • Y-means Algorithm
  • Labeling Clusters
  • Intrusion Detection

17
Description of the dataset
  • KDD Cup 1999 Data
  • Main attack categories
  • DOS Denial of Service, (e.g. synood)
  • R2L Unauthorized access from a remote machine
    (e.g. guessing password)
  • U2R Unauthorized access to local superuser
    (root) privileges (e.g. various buffer overflow
    attacks)
  • Probing Surveillance and other probing (e.g.
    port scanning)
  • In total, 24 attack types in training data 14
    additional ones in test data...

18
Metric Normalization
  • Euclidean Metric (for distance computation)
  • Feature Normalization (to eliminate the
    difference in the scale of features)

19
Clustering Algorithm (Portnoy et. al.)
.
.
.
d1
d2
Empty set of clusters
d3
Xi
  • d1 is selected.
  • if d1 lt W ( predefined threshold value ),
  • then Xi is assigned to that cluster.
  • - else, a new cluster is created, then Xi is
    assigned to it.

Training set
20
Clustering Algorithm (Portnoy et. al.)
  • Advantage No need to know the initial no. of
    clusters.
  • Disadvantage Need to know W, which may label
    instances wrong in some cases.
  • However

20/29
21
Clustering Algorithm (Y-means Algorithm)
  • 3 main parts
  • assigning instances to k clusters
  • splitting clusters
  • merging clusters

22
Clustering Algorithm (Y-means Algorithm)
1. assigning instances to k clusters
redefine cluster centroid
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
k no. of clusters n no. of instances 1 lt k lt n
Dataset
22/29
23
Clustering Algorithm (Y-means Algorithm)
2. splitting clusters
t ( normal threshold) 2.32 s s standard
deviation
.
di
Xi ( instance )
.
t
  • if di gt t , Xi is an outlier.
  • New clusters are created firstly with the
    farthest outliers.

Confident area
24
Clustering Algorithm (Y-means Algorithm)
  • 3. merging clusters

.
Xi
If Xi is in the confident area of two clusters,
merge these clusters back.
25
Labeling Clusters
  • Our first assumption
  • of normal instances gtgt of intrusions
  • Label instances in large clusters normal
  • Label instances in small clusters intrusion
  • Start labeling as normal, until 99 of data is
    labeled as normal, label rest of them as
    intrusion.

Normal cluster
Intrusion cluster
26
Intrusion Detection
  • For test instance x,
  • Measure the distance to each cluster.
  • Select the nearest cluster C.
  • If C is normal cluster, label x as normal,
  • Otherwise label x as intrusion.

27
Overall Summary
  • IDS IDS Technologies
  • Using Clustering for Intrusion Detection
  • Methodology
  • Description of the dataset
  • Metric Normalization
  • Clustering Algorithm
  • Labeling Clusters
  • Intrusion Detection
  • Conclusion
  • Unsupervised Clustering is choosen.
  • KDD Cup 1999 Data
  • Y-means Algorithm is used for creating ID System.

28
References
  • 1 KDD Cup 1999 data. http//kdd.ics.uci.edu/dat
    abases/kddcup99/kddcup99.html.
  • 2 Y. Guan and A. A. Ghorbani. Y-means A
    clustering method for intrusion detection. In
    Proceedings of Canadian Conference on Electrical
    and Computer Engineering, pages 10831086, 2003.
  • 3 L. Portnoy, E. Eskin, and S. Stolfo.
    Intrusion detection with unlabeled data using
    clustering. In Proceedings of ACM CSS Workshop on
    Data Mining Applied to Security (DMSA-2001),
    2001.
  • 4 K. Scarfone and P. Mell. Guide to intrusion
    detection and prevention systems (idps), 2007.

29
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com