Title: Combined Data Mining Approach for Intrusion Detection
1Combined Data Mining Approach for Intrusion
Detection
Urko Zurutuza, R. Uribeetxeberria, E. Azketa, G.
Gil, J. Lizarraga, M. Fernández
2Overview
Introduction Anatomy of an attack IDS
alarms Problem description and proposal for
solution
1
Model Description Data Preprocessing Alarm
Clustering Cluster Association Rules Attack
Scenario Generation
3
Experimental Results DARPA 2000 Intrusion
Detection Data Set Real Scenario
4
Conclusions and Further Work
5
3Introduction
Computer Security research group of Mondragon
University
- Security in embedded systems
- Audit and evaluation mechanisms
- Intrusion detection Honeypots
4Introduction Anatomy of an Attack
Perimeter Vulnerability Evaluation
Exploitation
Reconnoissance
Gaining access
Covering tracks
No
Scanning
Privilege scalation
Maintaining access
Enumeration
Pilfering
Denial of Service
IDS alarms
5Introduction
Problem description
- A single global attack generate
- Large quantity of alarms
- Manyfold different alarms
- Manyfold false positives
- Consecuences for system administrators
- Difficulty for understanding all the information
- Too much false information
- Too many alarms to analyze
Solution proposal
- Remove false positives to
- Isolate real alerts
- Find relationships among real alerts to
- Identify known attack steps
- Find relationships among different attack steps
to - Identify global attack scenarios
- We can perform the overall process automatically
using data mining techniques - Data Mining tool WEKA (Java API )
6Features describing multiple alarms
Selected features
Pre-processing
Clustering
Clusters
Clusters described by main feature
Clustering
Association
Comparison
Description of traffic patterns and known attack
patterns
Attack sequence
Step 1
Comparison
Chronological ordering
Step 2
7Model Description
Prelude database
8Model Description
9Model Description
10Experimental results DARPA 2000 Dataset
IDS Alarm logging architecture
11Experimental results DARPA 2000 Dataset
1.- CLUSTERING
12Experimental results DARPA 2000 Dataset
2.- ASSOCIATION
1. address_source172.16.114.1 4318 gt
alert_ident472 ip_len56 4318 conf(1) 2.
alert_ident472 4318 gt address_source172.16.114
.1 ip_len56 4318 conf(1) 3. address_source172.
16.114.1 alert_ident472 4318 gt ip_len56 4318
conf(1) 4. address_source172.16.114.1
ip_len56 4318 gt alert_ident472 4318 conf(1)
5. alert_ident472 ip_len56 4318 gt
address_source172.16.114.1 4318 conf(1) 6.
alert_ident472 4318 gt ip_len56 4318 conf(1)
7. address_source172.16.114.1 4318 gt
ip_len56 4318 conf(1)
Summarize and drop redundant rules!!
13Experimental results DARPA 2000 Dataset
2.- ASSOCIATION
Summarized attack rules
3.- COMPARISON
Codified abstract-level rules
14Experimental results DARPA 2000 Dataset
3.- COMPARISON
Codified abstract-level rules
Vs.
Atack Traffic Patern rules
15Experimental results DARPA 2000 Dataset
4.- Chronological Ordering Attack Scenario
Result From 12,064 alarms to 6 meta-alarms
explaining the attack scenario
16Experimental results Real Scenario
17Experimental results Real Scenario
1.- CLUSTERING
- 26,660 IDS alarms accumulated in few hours
- EM Algorithm obtained 7 clusters
18Experimental results Real Scenario
2.- ASSOCIATION
- Apriori association rules algorithm extracts 1133
rules. - Rule dedundancy reduction decreases to 60
19Experimental results Real Scenario
3.- COMPARISON
Codified abstract-level rules
Attack Traffic Patern rules
20Experimental results Real Scenario
4.- Chronological Ordering Attack Scenario
Result From 26,660 alarms to 6 meta-alarms
explaining the attack scenario
21Conclusions and Further Work
22Thank You
?