Unsupervised Intrusion Detection Using Clustering Approach - PowerPoint PPT Presentation

About This Presentation

Title:

Unsupervised Intrusion Detection Using Clustering Approach

Description:

... Misuse Detection Anomaly Detection * * * Considering these difficulties, ... * The intrusions are rare with respect to normal network traffic, ... – PowerPoint PPT presentation

Number of Views:185

Avg rating:3.0/5.0

Slides: 30

Provided by: Fer132

Category:

more less

Transcript and Presenter's Notes

Title: Unsupervised Intrusion Detection Using Clustering Approach

1
Unsupervised Intrusion Detection Using
ClusteringApproach

Muhammet Kabukçu
Sefa Kiliç
Ferhat Kutlu
Teoman Toraman

2
Outline

Introduction
Using Clustering for Intrusion Detection
Methodology
Overall Summary
Conclusion
References

3
Introduction

Intrusion detection is the process of monitoring
the events occurring in a computer system or
network and analyzing them for signs of possible
incidents.

Incidents are violations or imminent threats of
violation of
computer security policies,
acceptable use policies,
standard security practices.

4
Introduction

An intrusion detection system (IDS) is software
that automates the intrusion detection process.

IDSs are primarily focuses on identifying
possible incidents and detecting when an attacker
has successfully compromised a system by
exploiting vulnerability in the system.

5
Introduction
6
Signature-Based Detection

A signature is a pattern that corresponds to a
known threat (e.g. a telnet attempt with a
username of "root", which is a violation of an
organization's security policy).
Signature-based detection is the process of
comparing signatures against observed events to
identify possible incidents.
Advantage Very effective at detecting known
threats.
Disadvantage Ineffective at detecting
previously unknown
threats.

7
Anomaly-Based Detection

The process of comparing definitions of what
activity is considered normal against observed
events to identify significant deviations.
Capable of detecting previously unknown threats.
Uses host or network-specific profiles.

8
Detection by Stateful Protocol Analysis

The process of comparing predetermined profiles
of generally accepted definitions of benign
protocol activity for each protocol state against
observed events to identify deviations.
Relies on vendor-developed universal profiles
that specify how particular protocols should and
should not be used.

9
Using Clustering for Intrusion Detection

Methods other than Signature-Based Detection use
data mining and machine learning algorithms to
train on labeled network data.
For training data, there are two major paradigms
Misuse Detection Anomaly Detection.

Which one to use ???
10
Using Clustering for Intrusion Detection- Misuse
Detection -

In misuse detection, machine learning algorithms
are used with labeled data.
By using the extracted features from labeled
network traffic, network data is classified.
By using new data which includes new type of
attacks, detection models are retrained.

11
Using Clustering for Intrusion Detection-
Anomaly Detection -

In anomaly detection,
models are built by training on normal data,
deviations are searched over the normal model.

Generating purely normal data is
very difficult and costly in practice.
It is very hard to guarantee that
there are no attacks during the time
the traffic is collected from the
network.

12
Using Clustering for Intrusion Detection

Misuse Detection Anomaly Detection.

Use a mechanism to detect intrusions by using
unlabeled data as a train model.
Find intrusions buried within that data.

13
Using Clustering for Intrusion Detection
Unsupervised Anomaly Detection Algorithm
A Set of Unlabeled Data
Detected Intrusion Clusters

Assumptions for unsupervised anomaly detection
algorithm
The intrusions are rare with respect to normal
network traffic.
The intrusions are different from normal network
traffic.
As a Result
The intrusions will appear as outliers in the
data.

Connection Comparison with Detected Clusters
Detected malicious attacks
14
Using Clustering for Intrusion Detection

The unsupervised anomaly
detection algorithm clusters
the unlabeled data instances
together into clusters using a
simple distance-based metric.

15
Using Clustering for Intrusion Detection

Once data is clustered, all of the
instances that appear in
small clusters are labeled as
anomalies because
The normal instances should form large clusters
compared to the intrusions,
Malicious intrusions and normal instances are
qualitatively different, so they do not fall into
the same cluster.

Intrusion cluster
Normal cluster
16
Methodology

Description of the dataset
Metric Normalization
Clustering Algorithm
Portnoy et. al.
Y-means Algorithm
Labeling Clusters
Intrusion Detection

17
Description of the dataset

KDD Cup 1999 Data
Main attack categories
DOS Denial of Service, (e.g. synood)
R2L Unauthorized access from a remote machine
(e.g. guessing password)
U2R Unauthorized access to local superuser
(root) privileges (e.g. various buffer overflow
attacks)
Probing Surveillance and other probing (e.g.
port scanning)
In total, 24 attack types in training data 14
additional ones in test data...

18
Metric Normalization

Euclidean Metric (for distance computation)
Feature Normalization (to eliminate the
difference in the scale of features)

19
Clustering Algorithm (Portnoy et. al.)
.
.
.
d1
d2
Empty set of clusters
d3
Xi

d1 is selected.
if d1 lt W ( predefined threshold value ),
then Xi is assigned to that cluster.
- else, a new cluster is created, then Xi is
assigned to it.

Training set
20
Clustering Algorithm (Portnoy et. al.)

Advantage No need to know the initial no. of
clusters.
Disadvantage Need to know W, which may label
instances wrong in some cases.
However

20/29
21
Clustering Algorithm (Y-means Algorithm)

3 main parts
assigning instances to k clusters
splitting clusters
merging clusters

22
Clustering Algorithm (Y-means Algorithm)
1. assigning instances to k clusters
redefine cluster centroid
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
k no. of clusters n no. of instances 1 lt k lt n
Dataset
22/29
23
Clustering Algorithm (Y-means Algorithm)
2. splitting clusters
t ( normal threshold) 2.32 s s standard
deviation
.
di
Xi ( instance )
.
t

if di gt t , Xi is an outlier.
New clusters are created firstly with the
farthest outliers.

Confident area
24
Clustering Algorithm (Y-means Algorithm)

3. merging clusters

.
Xi
If Xi is in the confident area of two clusters,
merge these clusters back.
25
Labeling Clusters

Our first assumption
of normal instances gtgt of intrusions
Label instances in large clusters normal
Label instances in small clusters intrusion
Start labeling as normal, until 99 of data is
labeled as normal, label rest of them as
intrusion.

Normal cluster
Intrusion cluster
26
Intrusion Detection

For test instance x,
Measure the distance to each cluster.
Select the nearest cluster C.
If C is normal cluster, label x as normal,
Otherwise label x as intrusion.

27
Overall Summary

IDS IDS Technologies
Using Clustering for Intrusion Detection
Methodology
Description of the dataset
Metric Normalization
Clustering Algorithm
Labeling Clusters
Intrusion Detection
Conclusion
Unsupervised Clustering is choosen.
KDD Cup 1999 Data
Y-means Algorithm is used for creating ID System.

28
References

1 KDD Cup 1999 data. http//kdd.ics.uci.edu/dat
abases/kddcup99/kddcup99.html.
2 Y. Guan and A. A. Ghorbani. Y-means A
clustering method for intrusion detection. In
Proceedings of Canadian Conference on Electrical
and Computer Engineering, pages 10831086, 2003.
3 L. Portnoy, E. Eskin, and S. Stolfo.
Intrusion detection with unlabeled data using
clustering. In Proceedings of ACM CSS Workshop on
Data Mining Applied to Security (DMSA-2001),
2001.
4 K. Scarfone and P. Mell. Guide to intrusion
detection and prevention systems (idps), 2007.