Title: Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach
1Decomposition Methodology in Data Mining A
Feature Set Decomposition Approach
- Lior Rokach
- Faculty of Engineering
- Tel-Aviv University
????? ?? ????? ????? ????? "?????? ??????????"
??????????? ?"? ?????? ?????? ?????
?????? ????' ???? ?????
2Agenda
- Introduction and Motivation
- The Elementary Decomposition Methodology.
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure.
- Meta-Decomposer.
- Conclusions and Future Work.
3Data Mining
- Fayyad et al (1996) The nontrivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data.
- Friedman (1997) An automatic exploratory data
analysis of large databases.
- Hand (1998) A secondary data analysis of large
databases.
- Maimon (2003) General purpose mathematical tool
for modeling and analyzing large and complex
datasets
4Supervised Learning/Classification
5Illustrative Example
6Classifier (Classification Model)
e.g. by if-then rules
if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None and Educati
on Silver if Employment None a
nd Education 12 and Education then - None if Employment None and Ownershi
p Tenement and Education 15
then - Silver if Employment None and Owners
hip House and Education 15
then - Gold if Employment None and Ownershi
p None and Education 15 then - None if E
mployment Self then - Gold
7Scalability To Large Classification Tasks
- High number of records
- High number of features
- High number of classes
- Heterogeneousness
- More
8Issues in Scalability
- Computational complexity Since most learning
algorithms have a computational complexity that
is greater than linear in the number of
attributes or tuples, the execution time needed
to process such databases might become an
important issue. - Classification performance High dimensionality
increases the size of the search space in an
exponential manner. This in turn increases the
chances that the algorithm will find inaccurate
models that are not valid in general. - Storage complexity In most data mining
algorithms, the whole data set should be read
from the database magnetic storage into the
computers main memory before the process begins.
This causes problems since the main memorys
capability is much smaller than the capability of
magnetic disks. - Large Models Comprehensibility Problems and
Maintenance Problems.
9Approaches For Large Problems
- Sampling methods
- Massively parallel processing
- Efficient storage methods
- Dimension reduction
- More
10Agenda
- Introduction and Motivation.
- The Elementary Decomposition Methodology
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure.
- Meta-Decomposer.
- Conclusions and Future Work.
11The Engineering Approach Decomposition
- The purpose of decomposition methodology is to
break down a complex problem into several
smaller, less complex and more manageable
sub-problems that are solvable by existing tools,
then joining them to get the solution of the
original problem, without loss of model
performance.
12Decomposition Advantages
- Reduced Computing complexity
- Suitability for Parallel or Distributed
computation
- Ability to use different solution techniques for
individual sub problems
- Modularity easier maintenance and support of
the evolutionary computation concept
- Feasibility
- Achieving clearer results (more understandable)
- Increased performance (classification accuracy)
13Why Decomposition Can Improve Performance?
Bias-Variance Tradeoff
- The intrinsic error represents the error
generated due to the stochastic nature. This
quantity is the lower bound of any learning
algorithm. (Bayes Optimal Classifier) - The bias error is the persistent or systematic
error that the learning algorithm is expected to
make.
- The variance error captures random variation in
the algorithm from one training set to another,
namely it measures the sensitivity of the
algorithm to the actual training set.
14Bias and Variance (Bauer Kohavi, 1999)
Intrinsic Error
15Bias and Variance
- Simple methods (like Naïve Bayes) tend to have
high bias error and low variance error. (Showed
by Bauer Kohavi, 1999)
- Complex methods (like Decision Trees) tend to
have low bias error and high variance error
(Showed by Dietterich Kong, 1995).
16Bias and Variance Tradeoff
Model Complexity
Naïve Bayes
Decision Trees
Hansen (2000) Deterministic Case
17Bias and Variance Tradeoff
Optimal
18Issues in Decomposition
- What types of elementary decomposition methods
exist in classification inductions?
- Which elementary decomposition type performs best
for which problem? What factors should one take
into account when choosing the appropriate
decomposition type? - Given an elementary type, how should we infer the
best decomposition structure automatically?
- How should the subproblems be recomposed to
represent the original concept learning?
19Various Elementary Decomposition Approaches
20Illustrative Example
21 Function Decomposition
First a new concept named "wealth" is defined as
following
Then the original concept is considered
if Wealth Rich then - Gold
if Wealth Poor and Employment
Employee then - Silver if Wea
lth Poor and Employment None
then - None if Wealth Else
then - Silver
22 Concept Aggregation Decomposition
Initially we check whether the customer is
willing to purchase
if OwnershipTenement and VolumeEmploymentEmployee
then - No else - Yes
Then what type of insurance
if Employment Self
then - Gold if Ownership Ho
use and Aggregation Yes then - Gol
d if Employment Employee th
en - Silver if Ownership Tenement a
nd Employment None then - Silver
if Ownership None and Employment
None then - Silver
23Feature (Attribute) Decomposition
Using the attributes Ownership and Volume
if Ownership House then - Gold if
Ownership Tenement and Volume 1000
and Volume
None if Ownership None then - Silv
er if Ownership Tenement and (Volume Volume1300) then - Silver
Using the attributes Employment and Education
if Employment Employee and Education 12
then - Silver if Employment Employee
and Education none if Employment
Self then - Gold if Employment
None then - Silver
24Sample Decomposition
Model induced from the first half
if Employment Self then - Gold if
Volume then - None if Volume 1100 and Employment
Employee then - Silver if Employm
ent None and Education then - Silver if Employment None and Educa
tion 12 then - None
Model induced from the second half
if Employment Employee and Ownership House
then - None if Employment Employee and Ow
nership None then - Silver if Employment
Employee and Ownership Tenement and Volume
None if Employment Employee
and Ownership Tenement and Volume 1300
Then - Silver if Employment None
then - Silver if Employment Self then -
Gold
25Space Decomposition
Model induced for Education 15
if Volume 1000 then - Gold
if Volume Silve
r
Model induced for Education if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None t
hen - Silver if Employment Self then - G
old
26The Decomposer's Characteristics
- Structure Acquiring Method
- Mutually Exclusiveness
- Exhaustiveness
- The Inducer Usage
- Combiner Usage
- Sub-classifierss relationship
27Structure Acquiring Method
- Manually based on expert's knowledge on a
specific domain (Michie, 1995).
- Arbitrarily (Domingos, 1995).
- Due to some restrictions (DDM).
- Induced by a suitable algorithm (Zupan, 1997).
28Mutually Exclusive
- Mutually exclusive forms a restriction on the
problem space.
- However
- ME has a greater tendency in reduction of
execution time.
- Smaller models better comprehensibility and
better maintenance of the solution.
- Help avoid some of the error correlation problems
that characterize non mutually exclusive
decompositions.
29Exhaustiveness
- This property indicates whether all data elements
should be used in the decomposition. For instance
an exhaustive feature set decomposition refers to
the situation in which each feature participates
in at least one subset.
30The Intrinsic Inducer Usage
- This property indicates the relation between the
decomposer and the inducer used.
- inducer-free doesnt use inducers at all.
- inducer-dependent Developed for a certain
inducer.
- inducer-independent It is not developed for
specific inducer but it uses the same inducer on
all components.
- inducer-chooser given a set of inducers, the
system is capable of using most appropriate
inducer on each sub-problem.
31Combiner Usage
- This property specifies the relation between the
decomposer and the combiner
- combiner-dependent Has been developed
specifically for a certain combination method
- combiner-independent The combination method is
provided as an input.
- Combiner Chooser.
32The relationship between the sub-classifiers
- This property indicates whether the various
sub-classifiers are dependent.
- Dependent The outcome of a certain classifier
may effect the creation of the next classifier.
- Independent Each classifier is build
independently.
33Other Decomposition Frameworks
- Sharkey (1999) NN, Space Sample, Concept
Aggregation
- Kusiak (2000) Manual.
- None of these Frameworks considers the
coexistence of various decomposition methods,
namely
- When should we prefer a specific method on the
other (Note that Hansen (2000) has compared
Sample Space)
- Whether it is possible to solve a given problem
using a hybridization of several methods.
34Relations to Other Fields
- Ensemble Models
- Accuracy Greedy
- Is not Comprehensive
- Each component can be reliably used to solve the
original problem.
- Distributed Data Mining
- Homogeneous Sample Decomposition
- Heterogeneous Sample Decomposition Feature
Decomposition
- Set by the Environment
35Agenda
- Introduction and Motivation.
- The Elementary Decomposition Methodology.
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure.
- Meta-Decomposer.
- Conclusions and Future Work.
36Classification Using Feature (Attribute) Set
Decomposition
37Notation
38Problem Formulation Feature Set Decomposition
39Naïve Bayes Combination
40Justification for Using Naïve/Simple Bayes
Composition (Combination)
- Suitable to Feature Set Decomposition.
- Understandable.
- Despite its simplicity it tends (in many cases)
to outperform complicated methods like Decision
Trees or Neural Networks.
41Simple Example(Illustrating the concepts)
- A Training set containing 20 examples created
according to
- Uniformly Distributed.
- No noise.
- No irrelevant/redundant attributes.
42(No Transcript)
43Optimal Decision Tree
Minimal optimal tree non unique solution
Classification Accuracy 100
44 Actual Decision Tree Generated by C4.5
Classification Accuracy 93.75
45Two Decision Trees Generated Using Feature
Decomposition
Classification Accuracy 100
46Naive Bayes in Feature Decomposition
Terminology"
Classification Accuracy 68.75
47When Feature Set Decomposition With Naïve Bayes
Combination is Preferable Theoretical Cases
- Conditionally Independence
- DNF
- Additive
48Algorithms for Feature Set Decomposition
- Greedy Space Searching Methods
- Serial Search
- Multi-Search
- Caching mechanism
49Computational Complexity
50Classifier Representation Oblivious Decision
Tree
51Performance Evaluation Methods
- Wrapper Approach
- Conditional Entropy (Maximum Likelihood)
- VC-Dimension
52Generalization Error using VC Dimension
53Definition of VC-Dimension
- The VC-Dimension of a hypothesis space H is size
of the largest set of points that are shattered
by that hypothesis space .
- A set of points S are shattered by an hypothesis
space H if and only if for every dichotomy of S
there exists some hypothesis in H consistent with
this dichotomy.
54Theorem VC Dimension Bounds For Decomposed
Oblivious Decision Trees
55Proof
56Corollary 1 for Naïve Bayes
57Corollary 2 Generalization Error Bound with
Conditional Independent
58DNF Results
59INDEP Results
60Structural Similarity Measure
61Experimental Results Comparing to Single
Classifiers Methods
62Experimental Result Comparing To Ensemble Methods
63Accuracy-Complexity Tradeoff
64Accuracy-Complexity Tradeoff
65Comparing To Ensemble Mehtods with Equivalent
Complexities
66When Feature Set Decomposition Is Beneficial?
Results are statistically significant according
to the Kruskal-Wallis one-way analysis of
variance by ranks .
67Benchmark For Datasets with Nodes/Sample Ratio0.2
68Result
- Comparing to single classifier methods the
proposed algorithm is never worst and about 80
better.
- Comparing to ensemble methods, the proposed
algorithm is usually better.
69Agenda
- Introduction and Motivation.
- The Elementary Decomposition Methodology.
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure
- Meta-Decomposer.
- Conclusions and Future Work.
70Improving Manufacturing Quality
- The goal is to find the relation between the
quality measure (target attribute) and the input
attributes (the manufacturing process data).
- The quality measure is usually presented as a
binary attribute (Passed"/"Not Passed")
- The input features include the characteristics
of
- production line (machine has been tuned, etc.)
- Raw materials
- Environment (temperature, etc)
- Human Resources
71Task Special Characteristics
- Skewed Distribution
- Many Relevant Features
- Small Datasets
72Precision and Recall
Classified
Actual
73The F-Measure
74F-Measure Splitting Criterion
75Simple Example For F-Measure Splitting Criterion
76The F-Measure Evaluation Criterion for Feature
Set Decomposition
77Illustrative Results
78Agenda
- Introduction and Motivation.
- The Elementary Decomposition Methodology.
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure.
- Meta-Decomposer.
- Conclusions and Future Work.
79Meta-Decomposer
- Based on datasets characteristics, the
meta-decomposer decides whether to decompose the
problem or not and what elementary decomposition
to use. - The idea is to train an inducer on previous
results resulting a meta-classifier which
predicts which decomposition method would perform
well.
80Meta-Data Generation Phase
81Meta Induction Phase
82Meta-Decomposer Usage
83Meta-Decomposer Results
Improvement 6.8
Improvement 7.1
84Agenda
- Introduction and Motivation
- The Elementary Decomposition Methodology.
- Feature Set Decomposition.
- Mining Manufacturing Data With F-Measure.
- Meta-Decomposer.
- Conclusions and Future Work
85Conclusions
- Feature Set Decomposition improve decision trees
accuracies when there are many contributing
features relative to the training set size
without increasing model increasing Model
Complexity. - Meta-Decomposer is capable to chose the best
elementary decomposition for a given problem.
- The Feature Set Decomposition F-Measure
evaluation criterion can be useful for
classification problems in quality assurance
86Limitations
- The algorithmic framework has no backtracking
capabilities (for instance removing a single
feature from a subset or removing an entire
subset). - The search currently begins from an empty
decomposition structure, which may be the reason
why the number of features in each subset is
relatively small. - The proposed F-Measure evaluation criterion
refers to binary target attributes.
87Future Research
- Recursively decompose a classification task using
elementary decomposition methods.
- How can we utilize prior knowledge for improving
decomposing methodology?
- Examining how the Feature Set Decomposition
concept can be implemented using other inducers
like Support Vectors Machines.
- A further theoretical investigation is required
in order to better understand under what
circumstances the feature set decomposition
methodology is appropriate and its relation to
other decomposition paradigms. - Extending the meta-learning schema to investigate
other decomposition methods, more precisely
Function Decomposition and Concept Aggregation.
- Checking whether the meta-learning results are
still valid when different decomposition
algorithms implementations are used.