Hong Cheng, Xifeng Yan, Jiawei Han and ChihWei Hsu - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Hong Cheng, Xifeng Yan, Jiawei Han and ChihWei Hsu

Description:

This paper demonstrates that the discriminative power of low-support features is ... We want to single out the discriminative patterns and remove redundant ones ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 14
Provided by: ox0
Category:

less

Transcript and Presenter's Notes

Title: Hong Cheng, Xifeng Yan, Jiawei Han and ChihWei Hsu


1
Discriminative Frequent Pattern Analysis for
Effective Classification
  • Hong Cheng, Xifeng Yan, Jiawei Han and Chih-Wei
    Hsu
  • ICDE 2007

2
Outline
  • Introduction
  • The framework of Frequent Pattern-based
    Classification
  • Experimental Results
  • Conclusion

3
  • How does frequent pattern-based classification
    achieve both high scalability and accuracy for
    the classification of large datasets?
  • What is the strategy for setting the minimum
    support threshold?
  • Given a set of frequent patterns, how should we
    select high quality ones for effective
    classification?

4
Introduction
  • The use of frequent patterns without feature
    selection will result in a huge feature space.
  • This might slow down the model learning process.
  • The classification accuracy deteriorates.
  • An effective and efficient feature selection
    algorithm is proposed to select a set of frequent
    and discriminative patterns for classification.

5
Frequent Pattern vs. Single Feature
The discriminative power of some frequent
patterns is higher than that of single features.
(a) Austral
(b) Cleve
(c) Sonar
Fig. 1. Information Gain vs. Pattern Length
6
The Framework of Frequent Pattern-based
Classification
  • It includes three steps
  • Feature generation
  • Feature selection
  • Model learning

7
Problem Formulation
  • , where
  • Let x be the feature vector of a data point s.
  • the dataset is represented in Bd as
    ,where

8
Discriminative Power v.s. Pattern frequency
  • This paper demonstrates that the discriminative
    power of low-support features is limited.
  • The low-support features could harm the
    classification accuracy due to overfitting.

9
Cont.
  • The discriminative power of a pattern is closely
    related to its support

For a pattern represented by a random
variable X,
Given a DB with a fixed class distribution, H(C)
is a constant.
IGub(CX) is closely related to
If
H(CX) reaches its lower bound when q0 or 1
Therefore, the discriminative power of low
frequency patterns is bounded by a small value.
10
Empirical Results
(c) Sonar
(b) Breast
(a) Austral
Fig. 2. Information Gain vs. Pattern Frequency
11
Set min_sup
  • A subset of high quality features are selected
    for classification,with
  • Because
    , features with support can be
    skipped.
  • The major steps
  • Compute
  • Choose
  • Find
  • Mine frequent patterns with

12
Feature Selection
  • Given a set of frequent patterns, both
    non-discriminative and redundant patterns exist.
  • We want to single out the discriminative patterns
    and remove redundant ones
  • The notion of Maximal Marginal Relevance (MMR) is
    borrowed

13
Experimental Results
14
Scalability Tests
15
Conclusion
  • An Effective and efficient feature selection
    algorithm is proposed to select a set of frequent
    and discriminative patterns for classification.
  • Scalability issue
  • It is computationally infeasible to generate all
    feature combinations and filter them with an
    information gain threshold
  • Efficient method (DDPMine FPtree pruning) H.
    Cheng, X. Yan, J. Han, and P. S. Yu, "Direct
    Discriminative Pattern Mining for Effective
    Classification", ICDE'08.
Write a Comment
User Comments (0)
About PowerShow.com