Benchmarking Anomaly-based Detection Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Benchmarking Anomaly-based Detection Systems

Description:

Three types: Training data ( the background data) Anomalies ... Different from the expected probability. Types: Juxta-positional : different arrangements of data ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 28
Provided by: csNorth
Category:

less

Transcript and Presenter's Notes

Title: Benchmarking Anomaly-based Detection Systems


1
Benchmarking Anomaly-based Detection Systems
  • Ashish Gupta
  • Network Security
  • May 2004

2
Overview
  • The Motivation for this paper
  • Waldo example
  • The approach
  • Structure in data
  • Generating the data and anomalies
  • Injecting anomalies
  • Results
  • Training and Testing the method
  • Scoring
  • Presentation
  • The ROC curves somewhat obvious

3
Motivation
  • Does anomaly detection depend on
    regularity/randomness of data ?

4
Wheres Waldo !
5
Wheres Waldo !
6
Wheres Waldo !
7
The aim
  • Hypothesis
  • Differences in data regularity affect anomaly
    detection
  • Different environments ? different regularity
  • Regularity
  • Highly redundant or random ?
  • Example of environments affect

010101010101010101010101 Or 0100011000101000100100
101
8
Consequences
One IDS Different False Alarm Rates
Need custom system/training for each environment ?
Temporal affects Regularity may vary over time ?
9
Structure in data
  • Measuring randomness

10
010101010101010101010101 Or 0100011000101000100100
101
Measuring Randomness

Relative Entropy
Sequential Dependence
Conditional Relative Entropy
11
The benchmark datasets
  • Three types
  • Training data ( the background data)
  • Anomalies
  • Testing data ( background anomalies )
  • Generating the sequences
  • 5 sets, each set ? 11 files ( for increasing
    regularity)
  • Each set ? different alphabet size
  • Alphabet size ? decides complexity

12
Anomaly Generation
  • Whats a surprise ?
  • Different from the expected probability
  • Types
  • Juxta-positional different arrangements of data
  • 001001001001001001111
  • Temporal
  • Unexpected periodicities
  • Other types ?

13
Types in this paper
  • Foreign symbol
  • AAABABBBABABCBBABABBA
  • Foreign n-gram
  • AAABABAABAABAAABBBBA
  • Rare n-gram
  • AABBBABBBABBBABBBABBBABBAA

14
  • Injecting anomalies
  • Make sure not more than 0.24

15
The experiments
  • The Hypothesis is true

16
  • The hypothesis
  • Nature of normal background noise affects
    signal detection
  • The anomaly detector
  • To detect anomalous subsequences
  • Learning phase ? n-gram probability table
  • Unexpected event ? anomaly !
  • Anomaly threshold decides level of surprise

17
  • Example of anomaly detection

AAA 0.12
AAB 0.13
ABA 0.20
BAA 0.17
BBB 0.15
BBA 0.12
AAC ? ANOMALY !
18
Scoring
  • Event outcomes
  • Hits
  • Misses
  • False alarms
  • Threshold
  • Decides level of surprise
  • 0 ? completely unsurprising, 1 ? astonishing
  • Need to calibrate

19
Presentation of results
  • Presents two aspects
  • correct detections
  • false detections
  • Detector operates through a range of
    sensitivities
  • Higher sensitivity ? ?
  • Need the right sensitivity

20
(No Transcript)
21
Interpretation
  • Nothing overlaps ? regularity affects detection !

22
  • What does this mean ?
  • Detection metrics are data dependent
  • Cannot say
  • My XYZ product will flag down 75 percent
    anomalies with 10 false hit rate !
  • Sir, are you sure ?

23
Real world data
  • Regularity index for system calls for different
    users

24
  • Is this surprising ?
  • What about network traffic ?

25
Conclusions
Anomaly Detection Effectiveness
Data Structure
Evaluation is data dependent
26
Conclusions
Different system Or Change the parameters
Change in regularity
27
Quirks ?
  • Assumes rather naïve detection systems
  • Simple retraining will not suffice
  • An intelligent detection can take this into
    account.
  • What is really an anomaly ?
  • If data is highly irregular, wont randomness
    produce some anomalies by itself
  • Anomaly is a relative term
  • Here anomalies are generated independently
Write a Comment
User Comments (0)
About PowerShow.com