Data Mining for Analysis of Rare Events: A Case of Computer Security and Other Applications - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Data Mining for Analysis of Rare Events: A Case of Computer Security and Other Applications

Description:

What are the application domains exhibiting these characteristics? ... When classifying the pixels in mammogram images, cancerous pixels represent only ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 62
Provided by: Louis1
Category:

less

Transcript and Presenter's Notes

Title: Data Mining for Analysis of Rare Events: A Case of Computer Security and Other Applications


1
Data Mining for Analysis of Rare Events A Case
of Computer Security and Other Applications
  • Jozef Zurada
  • Department of Computer Information Systems
  • College of Business
  • University of Louisville
  • Louisville, Kentucky
  • USA
  • email jmzura01_at_louisville.edu

2
Outline
  • Introduction to Knowledge Discovery in Databases
    and Data Mining
  • Data Mining Tools, Techniques, and Tasks
  • High-dimensional data
  • Feature and values reduction, and sampling
  • Rare Events
  • What are they?
  • What are the application domains exhibiting these
    characteristics?
  • What are the limitations of standard data mining
    techniques?
  • Major Techniques for Detecting Rare Events?
  • Supervised (Classification) techniques -
    Predictive Modeling
  • Tree based approaches, Neural networks
  • Unsupervised Techniques
  • Anomaly/Outlier Detection, Clustering
  • Other Data Mining Techniques Association Rules
  • Case Study Intrusion Detection Systems
  • What are the general types/categories of cyber
    attacks
  • Data Mining architecture for Intrusion Detection
    Systems
  • Conclusion and Questions

3
What is KDD?
  • Finding/extracting interesting information from
    data stored in large databases/data warehouses
  • Interesting
  • non-trivial
  • implicit
  • previously unknown (novel)
  • easily understood
  • rule length, number of conditions in a rule
  • potentially useful (actionable)
  • Information
  • patterns
  • rules
  • correlations
  • relationships hidden in data
  • descriptions of rare events
  • detection of outliers/anomalies/rare events
  • prediction of events
  • Interesting patterns represent knowledge

4
Measures of Pattern Interestingness
  • Objective
  • Rule support
  • Represents the percentage of transactions from a
    transaction database that the given rule
    satisfies
  • Probability P(XnY), where XnY indicates that a
    transaction contains both X and Y
  • support (X?Y) P(XnY)
  • Rule confidence
  • Assesses the degree of certainty of the detected
    association
  • Conditional probability P(YX), that is, the
    probability that a transaction containing X also
    contains Y
  • confidence (X?Y) P(YX)
  • Subjective
  • based on user beliefs in the data
  • Each measure associated with a threshold
    controlled by the user
  • Rules that do not satisfy a confidence threshold
    of, say 50, considered uninteresting
  • reflect noise, exceptions, or minority cases
  • Objective measures are combined with subjective
    measures

5
Steps in the KDD Process
  • Understanding the application domain
  • relevant prior knowledge and goals of application
  • Data cleaning, integration, and preprocessing
    (60 of effort)
  • Creating a target data set
  • data selection and transformation
  • feature and data reduction
  • selection of variables, sampling of rows
  • Applying the DM technique(s) - the core of KDD
  • choosing task classification, prediction,
    clustering
  • choosing the algorithm
  • search for patterns of interest
  • Interpreting evaluating mined patterns
  • Use of discovered knowledge

6
A KDD Process
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
7
A KDD Process
  • These activities are iterative, interactive and
    have a user-friendly character
  • End-user has to accept/reject the results
    produced by the KDD system

8
KDD Integration of Many Disciplines
  • Database Technology
  • Statistics
  • Machine Learning Artificial Intelligence
  • Information Science
  • High-Performance Computing
  • Visualization
  • Pattern Recognition
  • Neural Networks
  • Fuzzy Logic
  • Evolutionary Computing
  • Graph Theory

9
Data Mining Techniques
  • Neural Networks
  • Decision Trees
  • Fuzzy Systems (Logic, Rules)
  • Genetic Algorithms
  • Association Rules
  • Memory-based Reasoning (k-Nearest Neighbor)
  • Deviation/Anomaly Detection
  • Allow one to
  • learn from data
  • understand something new
  • answer tough questions
  • locate a problem
  • Can be complemented by traditional statistical
    techniques, OLAP, and SQL queries

10
Unsupervised DM Techniques
  • Use unsupervised learning
  • no target or class variable
  • groups input data records into classes based on
    self-similarities in the data
  • The goal is not specific
  • Tell me something interesting about the data
  • What common characteristics/profiles do
    terrorists share?
  • What is the activity pattern of a typical
    network intruder?
  • No constraints on a DM system
  • No indications of what the user expects and what
    kind of discovery could be of interest
  • Examples clustering, finding association rules,
    deviation detection, neural networks

11
Supervised DM Techniques
  • Use supervised learning
  • classification, prediction
  • target (dependent) variable has clearly defined
    label
  • Attempt to
  • predict a specific data value
  • weight, height, age
  • classify/categorize an item into a fixed set of
    known classes
  • (yes/no, friend/foe, healthy/bankrupt,
    legitimate/illegitimate)
  • Goal is specific
  • Ex. Will this company go bankrupt?
  • Is this individual a friend or a foe
    (terrorist)?
  • Is this credit card transaction legitimate
    or fraudulent?
  • Is someone trying to access a computer
    network an intruder or not?

12
Classification Task
  • Deals with discrete outcomes intruder/non-intrude
    r, legitimate/fraudulent, friend/foe
  • Learning a function that classifies a data item
    into one of several predefined classes
  • set of rules
  • mathematical equation
  • set of weights
  • Training set consists of pre-classified examples
  • Newly presented object is assigned a class
  • A network system administrator can use the
    classifier to decide whether a person accessing
    the network is an intruder or not

13
Clustering Task
  • Unsupervised learning
  • Segmenting a heterogeneous population into number
    of more homogeneous clusters or groups
  • No predefined classes which will be used for
    training
  • The records are grouped together based on
    self-similarity
  • It is up to you what meaning, if any, to attach
    to the resulting classes
  • It is often done as a prelude to some other form
    of DM (classification)
  • Often based on computing the distances between
    data points

14
Optimization Task
  • Finding one or a series of optimal solutions from
    among a very large number of possible solutions
  • Traditional mathematical techniques may break
    down because of billions of combinations

15
High-Dimensionality Data
  • Data/dimensionality reduction
  • of features
  • of samples
  • of values for the features
  • Gains of data reduction
  • Improved predictive/descriptive accuracy
  • Model better understood
  • Uses less rules, weights, variables
  • Fewer features
  • Next round of data collection, irrelevant
    features can be discarded

16
Data Preparation
  • Always done, regardless of the DM task and
    technique
  • Depends on
  • amounts of data
  • DM task (classification, clustering/segmentation)
  • types of values (numeric or categorical) for
    features/variables
  • behavior of data with respect to time
  • Normalization
  • data values scaled to a specific range 0,1,
    z-scores
  • Reasons
  • features with larger values overweight features
    with smaller values
  • clustering techniques based on computing the
    distance between data points
  • neural networks learn better
  • prevents saturation of neurons

17
Data Preparation
  • Data Smoothing/Rounding
  • Minor differences between the values of a feature
    unimportant
  • Binning
  • placing values in different intervals by
    consulting their neighbors
  • Transformation of features
  • Reduces the of features

18
Data Preparation
  • Outlier detection
  • Samples inconsistent with respect to the
    remaining data
  • Not an easy subject
  • Some applications focused on outlier detection
    others are not
  • Ex. detecting fraudulent credit card transactions
  • 1 out of 10,000 transactions is fraudulent.
  • In many classes of DM applications, we remove
    them
  • Careful with the automatic removal of outliers
  • Methods for outlier detection
  • Visualization for 2-D, 3-D or 4-D
  • Based on mean and variance of feature
  • Distance-based
  • multidimensional samples
  • calculate the distance between all samples in an
    n-dim dataset
  • outliers are those samples which do not have
    enough neighbors

19
Sampling
  • Millions of cases often 20,000 or so is enough
  • Sample has the same probability distribution as
    the population
  • Random sampling
  • with replacement
  • without replacement
  • Stratified sampling
  • Initial data set is split into non-overlapping
    subsets
  • sampling is performed on each strata
    independently of another
  • Incremental sampling
  • Increasingly larger random subsets to observe the
    trends in performances of the tool and to stop
    when no progress is made
  • How many samples?
  • No simple answer - enough
  • The depends on
  • algorithms
  • of classes the algorithm predicts
  • of variables in a data set
  • reliability of the results

20
Feature Reduction
  • Hundreds of features
  • many irrelevant, correlated, redundant
  • Feature selection often a space search problem
  • Small of features ? can be searched
    exhaustively (all combinations)
  • 20 features 220 combinations gt 1,000,000
    combinations

21
Feature Reduction Methods
  • Independent examination of features based on the
    mean variance
  • Test features separately one feature at a time
  • Feature examined normally distributed
  • Given feature is independent of the others
  • Examines one feature at a time without taking
    into account the relationship to other features
  • Collective examination of features based on
    feature means and covariances
  • tests all features together
  • features have normally distributed values
  • impractical and computationally prohibitive
  • yields huge search space

22
Feature Reduction Methods
  • Principal component analysis (PCA)
  • Very popular, well-established, frequently used
  • Complex in terms of calculations
  • Components, which contribute the least to the
    variation in the data set, are eliminated
  • Entropy measure
  • Called unsupervised feature selection
  • no output feature containing a class label
  • Removing an irrelevant feature from a set may not
    change the information content of the data set
  • Information content is measured by entropy
  • Features on numeric or categorical scale
  • Numeric - normalized Euclidean distance
  • Categorical - Hamming distance

23
Neural Networks
  • Enable to acquire, store, and utilize
    experiential knowledge
  • Try to emulate biological neurological systems
  • Try to mimic/approximate the way the human brain
    functions and processes information
  • Used successfully for the following tasks
  • Classification
  • Clustering
  • Optimization
  • Implemented as mathematical models of the human
    brain

24
Neural Networks
  • Characterized by their three properties
  • Computational property
  • built of neurons
  • summation node and activation function
  • organized in layers
  • interconnected using weights
  • Architecture of the network
  • Feed-forward NN with error back-propagation
  • classification, prediction
  • Kohonen network
  • clustering (segmentation)
  • Learning property
  • supervised mode (with a teacher)
  • unsupervised mode (without a teacher)
  • Knowledge is encoded in networks weights

25
(No Transcript)
26
Decision Trees
  • Useful for classification tasks
  • Learn from data, like neural networks
  • Operation based on the algorithms that
  • make the clusters at the node purer and purer by
    progressively reducing disorder (impurity) in the
    original data set
  • impurity is measured by entropy
  • find the optimum number of splits and determine
    where to partition the data to maximize the
    information gain
  • Nodes, branches and leaves indicate the
    variables, conditions, and outcomes, respectively
  • Most predictive variable placed at the top node
    of the tree
  • Model is represented in the form of explicit and
    understandable rule-like relationships among
    variables
  • Each rule represents a unique path from the root
    to each leaf
  • Not as robust and good as neural networks in
    detecting complex nonlinear relationships between
    variables

27
(No Transcript)
28
Fuzzy Logic
  • Enables to build fuzzy systems
  • Knowledge is encoded in fuzzy sets and fuzzy
    rules
  • Fuzzy rules enable one to reason or describe a
    process in terms of approximations
  • Fuzzy sets sets without clearly defined
    boundaries
  • Can produce very accurate results
  • Fast response time
  • Knowledge about the fuzzy rules and fuzzy sets
  • elicited from domain experts
  • generated from the given data
  • neuro-fuzzy systems

29
Genetic Algorithms
  • Solve problems (mainly optimization) by borrowing
    a technique from nature
  • Use 3 Darwins basic principles
  • Survival of the fittest (reproduction)
  • Cross-breeding (crossovers)
  • Mutation
  • to create approximate solutions for problems
  • fitness function selection and encoding genomes
    is often difficult
  • Example
  • You work for a shipping firm and have to make
    shipments to 6 different towns. You have one car
    and your task is to minimize the distance
    traveled. The plane can visit each city only once
    and can start from any city.

30
Traveling Salesman Problem (TSP)
N6 cities Number of unique paths (N-1)!/2 (6-1
)!/25!/2(12345)/260
Chicago
5
1
New York
San Francisco
2
N25 cities Number of unique paths(N-1)!/2 (25-
1)!/2 (12345.222324)/2 1.55x 1025
paths (very large number) It would take the
fastest computer millions of years to calculate
all possible solutions (paths) Computationally
intractable
4
Los Angeles
3
Miami
7
6
Mexico City
31
Rare Events
  • We are drowning in the massive amount of data
    that are being collected, while starving for
    knowledge at the same time
  • Despite the enormous amount of data, particular
    events of interest are still quite rare
  • Rare events are events that occur very
    infrequently, i.e. their frequency ranges from
    .01 to 10
  • However, when they occur, their consequences can
    be quite dramatic often in a negative sense

32
Applications of Rare Cases
  • Network intrusion detection
  • Number of intrusions on the network is typically
    a very small fraction of the total network
    traffic
  • Credit card fraud transaction
  • Millions of legitimate transactions are stored,
    while only a very small percentage is fraudulent
  • Medical diagnostics
  • When classifying the pixels in mammogram images,
    cancerous pixels represent only a very small
    fraction of the entire image

33
Applications of Rare Cases
  • Web mining
  • lt 3 of all people visiting Amazon.com make a
    purchase
  • Identifying passengers at airports (through
    biometrics) and screening their luggage
  • Only an extremely small number of passengers is
    suspected of hostile activities the same refers
    to the passengers luggage that may contain
    explosives
  • Fraud detection
  • auto insurance detecting people who stage
    accidents to collect on insurance
  • Profiling Individuals
  • finding clusters of model terrorists who share
    similar characteristics
  • Money laundering, Financial fraud, Churn analysis

34
Key Technical Challenges for Detecting Rare Events
  • Large data size
  • High dimensionality
  • Temporal nature of the data
  • Skewed class distribution
  • Rare events are underrepresented in the data set
    minority class
  • Data preprocessing
  • On-line analysis

35
Limitations of Standard Data Mining Schemes
  • Many classic data mining issues and methods apply
    in the domain of rare cases
  • Limitations
  • Standard approaches for feature selection and
    construction, computing distances between
    samples, and sampling do not work well for rare
    case analysis
  • While most normal events are similar to each
    other, rare events are quite different from one
    another
  • Regular network traffic is fairly standard, while
    suspicious ones vary from the standard ones in
    many different ways
  • Metrics used to evaluate normal event detection
    methods
  • Overall classification accuracy is not
    appropriate for evaluating methods for rare event
    detection
  • In many applications data keeps arriving in
    real-time, and there is a need to detect rare
    events on the fly, with models built only on the
    events seen so far

36
Computer Security
  • Broad and extremely important field
  • Generally encompasses two aspects
  • How computers can be used to secure the
    information contained within organizations
  • Detection and/or prevention of unauthorized
    access or attacks on computers, networks,
    operating system, data, and applications local to
    an organization
  • How computers can be used to detect hostile
    activity in a sensitive geographical area (such
    as in an airport)
  • Involves computer vision technology
  • Identifying patterns of activities that can
    suggest a friend or foe

37
Computer Security
  • The ability of a computer system to protect
    information and system resources with respect to
  • Confidentiality Prevention of unauthorized
    disclosure of information
  • Integrity Prevention of unauthorized
    modification of information
  • Availability Prevention of unauthorized
    withholding of information
  • Intrusion Cyber attack that tries to bypass
    security mechanism
  • Outsider attack on the system from Internet
  • Hackers, spies, kiddies
  • Stealing, spying, probing (to collect information
    about the host)
  • DoS attacks, viruses, worms
  • Insider (employee) attempt to gain and misuse
    non-authorized privileges

38
Taxonomy of Computer Attacks
  • Intrusions can be classified according to several
    categories
  • Attack type
  • DoS, worms/trojan horses
  • Number of network connections involved in the
    attack
  • single connection cyber attacks
  • multiple connections cyber attacks
  • Source of the attack
  • multiple location vs. single location
    coordinated/distributed attacks?
  • inside vs. outside
  • Target of the attack
  • Single or many different destinations
  • Environment (network, host, P2P (peer-to-peer),
    wireless networks, ..)
  • Less secure physical layer
  • No traffic concentration points for monitoring
    packets
  • Automation (manual, automated, semi-automated
    attack)
  • Need to analyze network data from several sites
    to detect these attacks

39
Prevention Existing Security Mechanisms
  • Security protocols and policies
  • IPSec Security at the IP layer
  • Source authentication
  • Encryption
  • Secure Socket Layer (SSL)
  • Source authentication
  • Encryption
  • Host based protections
  • Regularly installing patches, defending accounts,
    integrity checks
  • Firewalls
  • Control flow of traffic between networks
  • Block traffic from Internet and to Internet
  • Monitor communication between networks and
    examines each packet to see if it should be let
    through
  • All the above mechanism are insufficient due to
  • Security holes, insider attacks, multiple levels
    of data confidentiality within an organization
  • Sophistication of cyber attacks, their severity,
    and increased intruders knowledge
  • Data mining can help
  • It is not a cure for all problems

40
Motivation - Data Mining for Intrusion Detection
  • Increased interest in data mining based intrusion
    detection
  • Attacks for which it is difficult to build
    signatures
  • Attack stealthiness
  • Unforeseen/Unknown/Emerging attacks
  • Distributed/coordinated attacks
  • Data mining approaches for intrusion detection
  • Misuse detection
  • Supervised learning
  • Anomaly detection
  • Unsupervised learning
  • Summarization of attacks using association rules

41
Motivation - Data Mining for Intrusion Detection
  • Data mining approaches for intrusion detection
  • Misuse detection
  • Supervised learning
  • Based on extensive knowledge of patterns
    associated with known attacks provided by human
    experts
  • Building predictive models from labeled data sets
    (instances are labeled as normal or
    intrusive) to identify known intrusions
  • Major advantages
  • High accuracy in detecting many kinds of known
    attacks
  • Produce models that can be easily understood
  • Major limitations
  • Cannot detect unknown and emerging attacks
  • The data has to be labeled
  • Signature database has to be manually revised for
    each new type of discovered attack
  • Major approaches pattern (signature) matching,
    expert systems, neural networks, decision trees,
    logistic regression, memory-based reasoning
  • SNORT system

42
Motivation - Data Mining for Intrusion Detection
  • Data mining approaches for intrusion detection
  • Anomaly detection
  • Unsupervised learning
  • Based on profiles that represent normal behavior
    of users, hosts, or networks, and detecting
    attacks as significant deviations from this
    profile
  • Major benefit - potentially able to recognize
    unforeseen attacks
  • Major limitation - possible high false alarm
    rate, since detected deviations do not
    necessarily represent actual attacks
  • Major approaches statistical methods, expert
    systems, clustering, neural networks, outlier
    detection schemes, deviation/anomaly detection
  • Analyze each event to determine how similar (or
    dissimilar) it is to the majority
  • Success depends on the choice of similarity
    measures, dimension weighting
  • Summarization of attacks using association rules

43
IDS Information Source
  • Host-based IDS
  • base the decisions on information obtained from a
    single host (e.g. system log data, system calls
    data)
  • Network-based IDS
  • make decisions according to the information and
    data obtained by monitoring the traffic in the
    network to which the hosts are connected
  • Wireless network IDS
  • detect intrusions by analyzing traffic between
    mobile nodes
  • Application Logs
  • detect intrusions analyzing for example database
    logs (database misuse), web logs
  • IDS Sensor Alerts
  • analysis on low-level sensor alarms
  • Analysis of alarms generated by other IDSs

44
Data Sources in Network Intrusion Detection
  • Network traffic data is usually collected using
    network sniffers
  • Tcpdump
  • 080215.471817 0107b384633
    0107b384633 loopback 60
  • 0000 0100 0000 0000 0000 0000 0000 0000
  • 0000 0000 0000 0000 0000 0000 0000 0000
  • 0000 0000 0000 0000 0000 0000 0000
  • 080219.391039 172.16.112.100.3055 gt
    172.16.112.10.ntp v1 client strat 0 poll 0 prec
    0
  • 080219.391456 172.16.112.10.ntp gt
    172.16.112.100.3055 v1 server strat 5 poll 4
    prec - 16 (DF)
  • net-flow tools
  • Source and destination IP address, Source and
    destination ports, Type of service, Packet and
    byte counts, Start and end time, Input and output
    interface numbers, TCP flags, Routing information
    (next-hop address, source autonomous system (AS)
    number, destination AS number)
  • 0624.12439.344 0624.12448.292 211.59.18.101
    4350 160.94.179.138 1433 6 2 3 144
  • 0624.9110.667 0624.9119.635 24.201.13.122
    3535 160.94.179.151 1433 6 2 3 132
  • 0624.12440.572 0624.12449.496 211.59.18.101
    4362 160.94.179.150 1433 6 2 3 152
  • Collected data are in the form of network
    connections or network packets (a network
    connection may contain several packets)

45
Projects Data Mining in Intrusion Detection
  • MADAM ID (Mining Audit Data for Automated Models
    for Intrusion Detection) Columbia University,
    Georgia Tech, Florida Tech
  • ADAM (Audit Data Analysis and Mining) - George
    Mason University
  • MINDS (University of Minnesota)
  • Intelligent Intrusion Detection IIDS
    (Mississippi State University)
  • Data Mining for Network Intrusion Detection
    (MITRE corporation)
  • Institute for Security Technology Studies (ISTS),
    Dartmouth College
  • Intrusion Detection Techniques (Arizona State
    University)
  • Agent based data mining system (Iowa State
    University)
  • IDDM (Intrusion Detection using Data Mining
    Techniques) Department of Defense, Australia

46
Data Preprocessing for Data Mining in ID
  • Converting the data from monitored system
    (computer network, host machine, ) into data
    (features) that will be used in data mining
    models
  • For misuse detection, labeling data examples into
    normal or intrusive may require enormous time for
    many human experts
  • Building data mining models
  • Misuse detection models
  • Anomaly detection models
  • Analysis and summarization of results

47
(No Transcript)
48
(No Transcript)
49
Misuse Detection - Evaluation of Rare Class
Problems F-value
  • Accuracy is not sufficient metric for evaluation
  • (TNTP)/(TNFPFNTP)
  • Ex. Network traffic data set with 99.99 of
    normal data and 0.01 of intrusions
  • Trivial classifier that labels everything with
    the normal class can achieve 99.99 accuracy!!!
  • Focus on both recall and precision
  • Recall (R) TP/(TPFN)
  • Precision (P) TP/(TPFP)
  • F-measure 2RP/(RP)
  • C rare class (attacks)
  • NC normal class (normal connections

50
Misuse Detection - Evaluation of Rare Class
Problems ROC
  • C rare class (attacks)
  • NC normal class (normal connections)
  • Standard measures for evaluating rare class
    problems
  • Detection rate (Recall) ratio between the
    number of correctly detected rare events
    (attacks) and the total number of rare events
    (attacks)
  • False alarm (false positive) rate ratio between
    the number of data records from majority class
    (normal connections) that are misclassified as
    rare events (attacks) and the total number of
    data records from majority class (normal
    connections)
  • ROC Curve trade-off between detection rate and
    false alarm rate

51
Misuse Detection - Evaluation of Rare Class
Problems ROC
Area under the ROC curve (AUC) is computed using
a form of the trapezoid rule
52
Misuse Detection - Manipulating Data Records
  • Over-sampling the rare class
  • Make the duplicates of the rare events until the
    data set contains as many examples as the
    majority class gt balance the classes
  • Does not increase information but increase
    misclassification cost
  • SMOTE (Synthetic Minority Over-sampling
    TEchnique)
  • Synthetic generating the minority class examples
  • When generating artificial minority class
    example, distinguish two types of features
  • Continuous
  • Nominal (Categorical) features
  • Down-sizing (undersampling) the majority class
  • Sample the data records from majority class
  • Randomly
  • Near misses examples
  • Examples far from minority class examples (far
    from decision boundaries)
  • Introduce sampled data records into the original
    data set instead of original data records from
    the majority class
  • Usually results in a general loss of information
    and potentially overly general rules

53
Unsupervised Techniques Anomaly Detection
  • Build models of normal behavior and detect
    anomalies as deviations from it
  • Possible high false alarm rate - previously
    unseen (yet legitimate) data records may be
    recognized as anomalies
  • Two types of techniques
  • with access to normal data
  • with NO access to normal data (not known what is
    normal)

54
Outlier Detection Schemes
  • Outlier is defined as a data point which is very
    different from the rest of the data based on some
    measure
  • Detect novel attacks/intrusions by identifying
    them as deviations from normal behavior
  • Identify normal behavior
  • Construct useful set of features
  • Define similarity function
  • Use outlier detection algorithm
  • Statistics based approaches
  • Distance based approaches
  • Nearest neighbor approaches
  • Clustering based approaches
  • Density based schemes

55
Distance-based Outlier Detection Scheme
  • k-Nearest Neighbor approach
  • For each data point d compute the distance to the
    k-th nearest neighbor dk
  • Sort all data points according to the distance dk
  • Outliers are points that have the largest
    distance dk and therefore are located in the more
    sparse neighborhoods
  • Usually data points that have top n distance dk
    are identified as outliers
  • n user parameter
  • Not suitable for datasets that have modes with
    varying density

56
Distance-based Outlier Detection Scheme
  • Mahalanobis-distance based approach
  • Mahalanobis distance is more appropriate for
    computing distances with skewed distributions
  • The measure takes the shape of the data
    distribution into account
  • Example
  • In Euclidean space, data point p1 is closer to
    the origin than data point p2
  • When computing Mahalanobis distance, data points
    p1 and p2 are equally distant from the origin

57
Model based outlier detection schemes
  • Use a prediction model to learn the normal
    behavior
  • Every deviation from learned prediction model can
    be treated as anomaly or potential intrusion
  • Recent approaches
  • Neural networks
  • Unsupervised Support Vector Machines (SVMs)

58
Neural networks for outlier detection
  • Use a replicator 4-layer feed-forward neural
    network (RNN) with the same number of input and
    output nodes
  • Input variables are the output variables so that
    RNN forms a compressed model of the data during
    training
  • A measure of outlyingness is the reconstruction
    error of individual data points

59
Conclusions
  • Data mining analysis of rare events requires
    special attention
  • Many real world applications exhibit needle-in
    the-haystack type of problem
  • Current state of the art data mining techniques
    are still insufficient for efficient handling
    rare events
  • Need for designing better and more accurate data
    mining models

60
References
  • Han, J., and Kamber, M., Data Mining Concept and
    Techniques, Morgan Kaufmann Publishers, 2001
  • Kantardzic, M., Data Mining Methods, Concepts,
    and Algorithms, IEEE-Press/Wiley, 2003
  • Lazarevic, A., Srivastava, J., Kumar, V., Data
    Mining for Computer Security Applications, IEEE
    ICDM 2003 Tutorial
  • Kantardzic, M., and Zurada, J. (Eds.), Next
    Generations of Data Mining Applications, IEEE
    Press/Wiley, 2005
  • Tan, P., Steinbach, M., Kumar, V., Introduction
    to Data Mining, Addison Wesley, 2005
  • Zurada, J., Knowledge Discovery and Data Mining,
    Lecture Notes on Blackboard, Spring 2005

61
Thank you for attending this lecture!
  • Questions/Discussion?
Write a Comment
User Comments (0)
About PowerShow.com