Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach

Description:

then - None. if Employment==Employee and Ownership ==Tenement and Volume ... if Employment == None and Ownership ==House and Education = 15. then - Gold ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 68
Provided by: Ask78
Category:

less

Transcript and Presenter's Notes

Title: Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach


1
Decomposition Methodology in Data Mining A
Feature Set Decomposition Approach
  • Lior Rokach
  • Faculty of Engineering
  • Tel-Aviv University

????? ?? ????? ????? ????? "?????? ??????????"
??????????? ?"? ?????? ?????? ?????
?????? ????' ???? ?????

2
Agenda
  • Introduction and Motivation
  • The Elementary Decomposition Methodology.
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure.
  • Meta-Decomposer.
  • Conclusions and Future Work.

3
Data Mining
  • Fayyad et al (1996) The nontrivial process of
    identifying valid, novel, potentially useful, and
    ultimately understandable patterns in data.
  • Friedman (1997) An automatic exploratory data
    analysis of large databases.
  • Hand (1998) A secondary data analysis of large
    databases.
  • Maimon (2003) General purpose mathematical tool
    for modeling and analyzing large and complex
    datasets

4
Supervised Learning/Classification
5
Illustrative Example
6
Classifier (Classification Model)
e.g. by if-then rules
if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None and Educati
on Silver if Employment None a
nd Education 12 and Education then - None if Employment None and Ownershi
p Tenement and Education 15
then - Silver if Employment None and Owners
hip House and Education 15
then - Gold if Employment None and Ownershi
p None and Education 15 then - None if E
mployment Self then - Gold
7
Scalability To Large Classification Tasks
  • High number of records
  • High number of features
  • High number of classes
  • Heterogeneousness
  • More

8
Issues in Scalability
  • Computational complexity Since most learning
    algorithms have a computational complexity that
    is greater than linear in the number of
    attributes or tuples, the execution time needed
    to process such databases might become an
    important issue.
  • Classification performance High dimensionality
    increases the size of the search space in an
    exponential manner. This in turn increases the
    chances that the algorithm will find inaccurate
    models that are not valid in general.
  • Storage complexity In most data mining
    algorithms, the whole data set should be read
    from the database magnetic storage into the
    computers main memory before the process begins.
    This causes problems since the main memorys
    capability is much smaller than the capability of
    magnetic disks.
  • Large Models Comprehensibility Problems and
    Maintenance Problems.

9
Approaches For Large Problems
  • Sampling methods
  • Massively parallel processing
  • Efficient storage methods
  • Dimension reduction
  • More

10
Agenda
  • Introduction and Motivation.
  • The Elementary Decomposition Methodology
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure.
  • Meta-Decomposer.
  • Conclusions and Future Work.

11
The Engineering Approach Decomposition
  • The purpose of decomposition methodology is to
    break down a complex problem into several
    smaller, less complex and more manageable
    sub-problems that are solvable by existing tools,
    then joining them to get the solution of the
    original problem, without loss of model
    performance.

12
Decomposition Advantages
  • Reduced Computing complexity
  • Suitability for Parallel or Distributed
    computation
  • Ability to use different solution techniques for
    individual sub problems
  • Modularity easier maintenance and support of
    the evolutionary computation concept
  • Feasibility
  • Achieving clearer results (more understandable)
  • Increased performance (classification accuracy)

13
Why Decomposition Can Improve Performance?
Bias-Variance Tradeoff
  • The intrinsic error represents the error
    generated due to the stochastic nature. This
    quantity is the lower bound of any learning
    algorithm. (Bayes Optimal Classifier)
  • The bias error is the persistent or systematic
    error that the learning algorithm is expected to
    make.
  • The variance error captures random variation in
    the algorithm from one training set to another,
    namely it measures the sensitivity of the
    algorithm to the actual training set.

14
Bias and Variance (Bauer Kohavi, 1999)
Intrinsic Error
15
Bias and Variance
  • Simple methods (like Naïve Bayes) tend to have
    high bias error and low variance error. (Showed
    by Bauer Kohavi, 1999)
  • Complex methods (like Decision Trees) tend to
    have low bias error and high variance error
    (Showed by Dietterich Kong, 1995).

16
Bias and Variance Tradeoff
Model Complexity
Naïve Bayes
Decision Trees
Hansen (2000) Deterministic Case
17
Bias and Variance Tradeoff
Optimal
18
Issues in Decomposition
  • What types of elementary decomposition methods
    exist in classification inductions?
  • Which elementary decomposition type performs best
    for which problem? What factors should one take
    into account when choosing the appropriate
    decomposition type?
  • Given an elementary type, how should we infer the
    best decomposition structure automatically?
  • How should the subproblems be recomposed to
    represent the original concept learning?

19
Various Elementary Decomposition Approaches
20
Illustrative Example
21
Function Decomposition
First a new concept named "wealth" is defined as
following
Then the original concept is considered
if Wealth Rich then - Gold
if Wealth Poor and Employment
Employee then - Silver if Wea
lth Poor and Employment None
then - None if Wealth Else
then - Silver
22
Concept Aggregation Decomposition
Initially we check whether the customer is
willing to purchase
if OwnershipTenement and VolumeEmploymentEmployee
then - No else - Yes
Then what type of insurance
if Employment Self
then - Gold if Ownership Ho
use and Aggregation Yes then - Gol
d if Employment Employee th
en - Silver if Ownership Tenement a
nd Employment None then - Silver
if Ownership None and Employment
None then - Silver
23
Feature (Attribute) Decomposition
Using the attributes Ownership and Volume
if Ownership House then - Gold if
Ownership Tenement and Volume 1000
and Volume
None if Ownership None then - Silv
er if Ownership Tenement and (Volume Volume1300) then - Silver
Using the attributes Employment and Education
if Employment Employee and Education 12
then - Silver if Employment Employee
and Education none if Employment
Self then - Gold if Employment
None then - Silver
24
Sample Decomposition
Model induced from the first half
if Employment Self then - Gold if
Volume then - None if Volume 1100 and Employment
Employee then - Silver if Employm
ent None and Education then - Silver if Employment None and Educa
tion 12 then - None
Model induced from the second half
if Employment Employee and Ownership House
then - None if Employment Employee and Ow
nership None then - Silver if Employment
Employee and Ownership Tenement and Volume
None if Employment Employee
and Ownership Tenement and Volume 1300
Then - Silver if Employment None
then - Silver if Employment Self then -
Gold
25
Space Decomposition
Model induced for Education 15
if Volume 1000 then - Gold
if Volume Silve
r
Model induced for Education if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None t
hen - Silver if Employment Self then - G
old
26
The Decomposer's Characteristics
  • Structure Acquiring Method
  • Mutually Exclusiveness
  • Exhaustiveness
  • The Inducer Usage
  • Combiner Usage
  • Sub-classifierss relationship

27
Structure Acquiring Method
  • Manually based on expert's knowledge on a
    specific domain (Michie, 1995).
  • Arbitrarily (Domingos, 1995).
  • Due to some restrictions (DDM).
  • Induced by a suitable algorithm (Zupan, 1997).

28
Mutually Exclusive
  • Mutually exclusive forms a restriction on the
    problem space.
  • However
  • ME has a greater tendency in reduction of
    execution time.
  • Smaller models better comprehensibility and
    better maintenance of the solution.
  • Help avoid some of the error correlation problems
    that characterize non mutually exclusive
    decompositions.

29
Exhaustiveness
  • This property indicates whether all data elements
    should be used in the decomposition. For instance
    an exhaustive feature set decomposition refers to
    the situation in which each feature participates
    in at least one subset.

30
The Intrinsic Inducer Usage
  • This property indicates the relation between the
    decomposer and the inducer used.
  • inducer-free doesnt use inducers at all.
  • inducer-dependent Developed for a certain
    inducer.
  • inducer-independent It is not developed for
    specific inducer but it uses the same inducer on
    all components.
  • inducer-chooser given a set of inducers, the
    system is capable of using most appropriate
    inducer on each sub-problem.

31
Combiner Usage
  • This property specifies the relation between the
    decomposer and the combiner
  • combiner-dependent Has been developed
    specifically for a certain combination method
  • combiner-independent The combination method is
    provided as an input.
  • Combiner Chooser.

32
The relationship between the sub-classifiers
  • This property indicates whether the various
    sub-classifiers are dependent.
  • Dependent The outcome of a certain classifier
    may effect the creation of the next classifier.
  • Independent Each classifier is build
    independently.

33
Other Decomposition Frameworks
  • Sharkey (1999) NN, Space Sample, Concept
    Aggregation
  • Kusiak (2000) Manual.
  • None of these Frameworks considers the
    coexistence of various decomposition methods,
    namely
  • When should we prefer a specific method on the
    other (Note that Hansen (2000) has compared
    Sample Space)
  • Whether it is possible to solve a given problem
    using a hybridization of several methods.

34
Relations to Other Fields
  • Ensemble Models
  • Accuracy Greedy
  • Is not Comprehensive
  • Each component can be reliably used to solve the
    original problem.
  • Distributed Data Mining
  • Homogeneous Sample Decomposition
  • Heterogeneous Sample Decomposition Feature
    Decomposition
  • Set by the Environment

35
Agenda
  • Introduction and Motivation.
  • The Elementary Decomposition Methodology.
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure.
  • Meta-Decomposer.
  • Conclusions and Future Work.

36
Classification Using Feature (Attribute) Set
Decomposition
37
Notation
38
Problem Formulation Feature Set Decomposition
39
Naïve Bayes Combination
40
Justification for Using Naïve/Simple Bayes
Composition (Combination)
  • Suitable to Feature Set Decomposition.
  • Understandable.
  • Despite its simplicity it tends (in many cases)
    to outperform complicated methods like Decision
    Trees or Neural Networks.

41
Simple Example(Illustrating the concepts)
  • A Training set containing 20 examples created
    according to
  • Uniformly Distributed.
  • No noise.
  • No irrelevant/redundant attributes.

42
(No Transcript)
43
Optimal Decision Tree
Minimal optimal tree non unique solution
Classification Accuracy 100
44

Actual Decision Tree Generated by C4.5
Classification Accuracy 93.75
45
Two Decision Trees Generated Using Feature
Decomposition
Classification Accuracy 100
46
Naive Bayes in Feature Decomposition
Terminology"
Classification Accuracy 68.75
47
When Feature Set Decomposition With Naïve Bayes
Combination is Preferable Theoretical Cases
  • Conditionally Independence
  • DNF
  • Additive

48
Algorithms for Feature Set Decomposition
  • Greedy Space Searching Methods
  • Serial Search
  • Multi-Search
  • Caching mechanism

49
Computational Complexity
50
Classifier Representation Oblivious Decision
Tree
51
Performance Evaluation Methods
  • Wrapper Approach
  • Conditional Entropy (Maximum Likelihood)
  • VC-Dimension

52
Generalization Error using VC Dimension
53
Definition of VC-Dimension
  • The VC-Dimension of a hypothesis space H is size
    of the largest set of points that are shattered
    by that hypothesis space .
  • A set of points S are shattered by an hypothesis
    space H if and only if for every dichotomy of S
    there exists some hypothesis in H consistent with
    this dichotomy.

54
Theorem VC Dimension Bounds For Decomposed
Oblivious Decision Trees
55
Proof
56
Corollary 1 for Naïve Bayes
57
Corollary 2 Generalization Error Bound with
Conditional Independent
58
DNF Results
59
INDEP Results
60
Structural Similarity Measure
61
Experimental Results Comparing to Single
Classifiers Methods
62
Experimental Result Comparing To Ensemble Methods
63
Accuracy-Complexity Tradeoff
64
Accuracy-Complexity Tradeoff
65
Comparing To Ensemble Mehtods with Equivalent
Complexities
66
When Feature Set Decomposition Is Beneficial?
Results are statistically significant according
to the Kruskal-Wallis one-way analysis of
variance by ranks .
67
Benchmark For Datasets with Nodes/Sample Ratio0.2
68
Result
  • Comparing to single classifier methods the
    proposed algorithm is never worst and about 80
    better.
  • Comparing to ensemble methods, the proposed
    algorithm is usually better.

69
Agenda
  • Introduction and Motivation.
  • The Elementary Decomposition Methodology.
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure
  • Meta-Decomposer.
  • Conclusions and Future Work.

70
Improving Manufacturing Quality
  • The goal is to find the relation between the
    quality measure (target attribute) and the input
    attributes (the manufacturing process data).
  • The quality measure is usually presented as a
    binary attribute (Passed"/"Not Passed")
  • The input features include the characteristics
    of
  • production line (machine has been tuned, etc.)
  • Raw materials
  • Environment (temperature, etc)
  • Human Resources

71
Task Special Characteristics
  • Skewed Distribution
  • Many Relevant Features
  • Small Datasets

72
Precision and Recall
Classified
Actual
73
The F-Measure
74
F-Measure Splitting Criterion
75
Simple Example For F-Measure Splitting Criterion
76
The F-Measure Evaluation Criterion for Feature
Set Decomposition
77
Illustrative Results
78
Agenda
  • Introduction and Motivation.
  • The Elementary Decomposition Methodology.
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure.
  • Meta-Decomposer.
  • Conclusions and Future Work.

79
Meta-Decomposer
  • Based on datasets characteristics, the
    meta-decomposer decides whether to decompose the
    problem or not and what elementary decomposition
    to use.
  • The idea is to train an inducer on previous
    results resulting a meta-classifier which
    predicts which decomposition method would perform
    well.

80
Meta-Data Generation Phase
81
Meta Induction Phase
82
Meta-Decomposer Usage
83
Meta-Decomposer Results
Improvement 6.8
Improvement 7.1
84
Agenda
  • Introduction and Motivation
  • The Elementary Decomposition Methodology.
  • Feature Set Decomposition.
  • Mining Manufacturing Data With F-Measure.
  • Meta-Decomposer.
  • Conclusions and Future Work

85
Conclusions
  • Feature Set Decomposition improve decision trees
    accuracies when there are many contributing
    features relative to the training set size
    without increasing model increasing Model
    Complexity.
  • Meta-Decomposer is capable to chose the best
    elementary decomposition for a given problem.
  • The Feature Set Decomposition F-Measure
    evaluation criterion can be useful for
    classification problems in quality assurance

86
Limitations
  • The algorithmic framework has no backtracking
    capabilities (for instance removing a single
    feature from a subset or removing an entire
    subset).
  • The search currently begins from an empty
    decomposition structure, which may be the reason
    why the number of features in each subset is
    relatively small.
  • The proposed F-Measure evaluation criterion
    refers to binary target attributes.

87
Future Research
  • Recursively decompose a classification task using
    elementary decomposition methods.
  • How can we utilize prior knowledge for improving
    decomposing methodology?
  • Examining how the Feature Set Decomposition
    concept can be implemented using other inducers
    like Support Vectors Machines.
  • A further theoretical investigation is required
    in order to better understand under what
    circumstances the feature set decomposition
    methodology is appropriate and its relation to
    other decomposition paradigms.
  • Extending the meta-learning schema to investigate
    other decomposition methods, more precisely
    Function Decomposition and Concept Aggregation.
  • Checking whether the meta-learning results are
    still valid when different decomposition
    algorithms implementations are used.
Write a Comment
User Comments (0)
About PowerShow.com