Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach

About This Presentation

Title:

Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach

Description:

then - None. if Employment==Employee and Ownership ==Tenement and Volume ... if Employment == None and Ownership ==House and Education = 15. then - Gold ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 68

Provided by: Ask78

Category:

more less

Transcript and Presenter's Notes

Title: Decomposition Methodology in Data Mining: A Feature Set Decomposition Approach

1
Decomposition Methodology in Data Mining A
Feature Set Decomposition Approach

Lior Rokach
Faculty of Engineering
Tel-Aviv University

????? ?? ????? ????? ????? "?????? ??????????"
??????????? ?"? ?????? ?????? ?????
?????? ????' ???? ?????

2
Agenda

Introduction and Motivation
The Elementary Decomposition Methodology.
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure.
Meta-Decomposer.
Conclusions and Future Work.

3
Data Mining

Fayyad et al (1996) The nontrivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data.
Friedman (1997) An automatic exploratory data
analysis of large databases.
Hand (1998) A secondary data analysis of large
databases.
Maimon (2003) General purpose mathematical tool
for modeling and analyzing large and complex
datasets

4
Supervised Learning/Classification
5
Illustrative Example
6
Classifier (Classification Model)
e.g. by if-then rules
if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None and Educati
on Silver if Employment None a
nd Education 12 and Education then - None if Employment None and Ownershi
p Tenement and Education 15
then - Silver if Employment None and Owners
hip House and Education 15
then - Gold if Employment None and Ownershi
p None and Education 15 then - None if E
mployment Self then - Gold
7
Scalability To Large Classification Tasks

High number of records
High number of features
High number of classes
Heterogeneousness
More

8
Issues in Scalability

Computational complexity Since most learning
algorithms have a computational complexity that
is greater than linear in the number of
attributes or tuples, the execution time needed
to process such databases might become an
important issue.
Classification performance High dimensionality
increases the size of the search space in an
exponential manner. This in turn increases the
chances that the algorithm will find inaccurate
models that are not valid in general.
Storage complexity In most data mining
algorithms, the whole data set should be read
from the database magnetic storage into the
computers main memory before the process begins.
This causes problems since the main memorys
capability is much smaller than the capability of
magnetic disks.
Large Models Comprehensibility Problems and
Maintenance Problems.

9
Approaches For Large Problems

Sampling methods
Massively parallel processing
Efficient storage methods
Dimension reduction
More

10
Agenda

Introduction and Motivation.
The Elementary Decomposition Methodology
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure.
Meta-Decomposer.
Conclusions and Future Work.

11
The Engineering Approach Decomposition

The purpose of decomposition methodology is to
break down a complex problem into several
smaller, less complex and more manageable
sub-problems that are solvable by existing tools,
then joining them to get the solution of the
original problem, without loss of model
performance.

12
Decomposition Advantages

Reduced Computing complexity
Suitability for Parallel or Distributed
computation
Ability to use different solution techniques for
individual sub problems
Modularity easier maintenance and support of
the evolutionary computation concept
Feasibility
Achieving clearer results (more understandable)
Increased performance (classification accuracy)

13
Why Decomposition Can Improve Performance?
Bias-Variance Tradeoff

The intrinsic error represents the error
generated due to the stochastic nature. This
quantity is the lower bound of any learning
algorithm. (Bayes Optimal Classifier)
The bias error is the persistent or systematic
error that the learning algorithm is expected to
make.
The variance error captures random variation in
the algorithm from one training set to another,
namely it measures the sensitivity of the
algorithm to the actual training set.

14
Bias and Variance (Bauer Kohavi, 1999)
Intrinsic Error
15
Bias and Variance

Simple methods (like Naïve Bayes) tend to have
high bias error and low variance error. (Showed
by Bauer Kohavi, 1999)
Complex methods (like Decision Trees) tend to
have low bias error and high variance error
(Showed by Dietterich Kong, 1995).

16
Bias and Variance Tradeoff
Model Complexity
Naïve Bayes
Decision Trees
Hansen (2000) Deterministic Case
17
Bias and Variance Tradeoff
Optimal
18
Issues in Decomposition

What types of elementary decomposition methods
exist in classification inductions?
Which elementary decomposition type performs best
for which problem? What factors should one take
into account when choosing the appropriate
decomposition type?
Given an elementary type, how should we infer the
best decomposition structure automatically?
How should the subproblems be recomposed to
represent the original concept learning?

19
Various Elementary Decomposition Approaches
20
Illustrative Example
21
Function Decomposition
First a new concept named "wealth" is defined as
following
Then the original concept is considered
if Wealth Rich then - Gold
if Wealth Poor and Employment
Employee then - Silver if Wea
lth Poor and Employment None
then - None if Wealth Else
then - Silver
22
Concept Aggregation Decomposition
Initially we check whether the customer is
willing to purchase
if OwnershipTenement and VolumeEmploymentEmployee
then - No else - Yes
Then what type of insurance
if Employment Self
then - Gold if Ownership Ho
use and Aggregation Yes then - Gol
d if Employment Employee th
en - Silver if Ownership Tenement a
nd Employment None then - Silver
if Ownership None and Employment
None then - Silver
23
Feature (Attribute) Decomposition
Using the attributes Ownership and Volume
if Ownership House then - Gold if
Ownership Tenement and Volume 1000
and Volume
None if Ownership None then - Silv
er if Ownership Tenement and (Volume Volume1300) then - Silver
Using the attributes Employment and Education
if Employment Employee and Education 12
then - Silver if Employment Employee
and Education none if Employment
Self then - Gold if Employment
None then - Silver
24
Sample Decomposition
Model induced from the first half
if Employment Self then - Gold if
Volume then - None if Volume 1100 and Employment
Employee then - Silver if Employm
ent None and Education then - Silver if Employment None and Educa
tion 12 then - None
Model induced from the second half
if Employment Employee and Ownership House
then - None if Employment Employee and Ow
nership None then - Silver if Employment
Employee and Ownership Tenement and Volume
None if Employment Employee
and Ownership Tenement and Volume 1300
Then - Silver if Employment None
then - Silver if Employment Self then -
Gold
25
Space Decomposition
Model induced for Education 15
if Volume 1000 then - Gold
if Volume Silve
r
Model induced for Education if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and Own
ership None then - Silver if Employment
Employee and Ownership Tenement and Volume 1300 then - None if EmploymentEmployee an
d Ownership Tenement and Volume 1300
then - Silver if Employment None t
hen - Silver if Employment Self then - G
old
26
The Decomposer's Characteristics

Structure Acquiring Method
Mutually Exclusiveness
Exhaustiveness
The Inducer Usage
Combiner Usage
Sub-classifierss relationship

27
Structure Acquiring Method

Manually based on expert's knowledge on a
specific domain (Michie, 1995).
Arbitrarily (Domingos, 1995).
Due to some restrictions (DDM).
Induced by a suitable algorithm (Zupan, 1997).

28
Mutually Exclusive

Mutually exclusive forms a restriction on the
problem space.
However
ME has a greater tendency in reduction of
execution time.
Smaller models better comprehensibility and
better maintenance of the solution.
Help avoid some of the error correlation problems
that characterize non mutually exclusive
decompositions.

29
Exhaustiveness

This property indicates whether all data elements
should be used in the decomposition. For instance
an exhaustive feature set decomposition refers to
the situation in which each feature participates
in at least one subset.

30
The Intrinsic Inducer Usage

This property indicates the relation between the
decomposer and the inducer used.
inducer-free doesnt use inducers at all.
inducer-dependent Developed for a certain
inducer.
inducer-independent It is not developed for
specific inducer but it uses the same inducer on
all components.
inducer-chooser given a set of inducers, the
system is capable of using most appropriate
inducer on each sub-problem.

31
Combiner Usage

This property specifies the relation between the
decomposer and the combiner
combiner-dependent Has been developed
specifically for a certain combination method
combiner-independent The combination method is
provided as an input.
Combiner Chooser.

32
The relationship between the sub-classifiers

This property indicates whether the various
sub-classifiers are dependent.
Dependent The outcome of a certain classifier
may effect the creation of the next classifier.
Independent Each classifier is build
independently.

33
Other Decomposition Frameworks

Sharkey (1999) NN, Space Sample, Concept
Aggregation
Kusiak (2000) Manual.
None of these Frameworks considers the
coexistence of various decomposition methods,
namely
When should we prefer a specific method on the
other (Note that Hansen (2000) has compared
Sample Space)
Whether it is possible to solve a given problem
using a hybridization of several methods.

34
Relations to Other Fields

Ensemble Models
Accuracy Greedy
Is not Comprehensive
Each component can be reliably used to solve the
original problem.
Distributed Data Mining
Homogeneous Sample Decomposition
Heterogeneous Sample Decomposition Feature
Decomposition
Set by the Environment

35
Agenda

Introduction and Motivation.
The Elementary Decomposition Methodology.
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure.
Meta-Decomposer.
Conclusions and Future Work.

36
Classification Using Feature (Attribute) Set
Decomposition
37
Notation
38
Problem Formulation Feature Set Decomposition
39
Naïve Bayes Combination
40
Justification for Using Naïve/Simple Bayes
Composition (Combination)

Suitable to Feature Set Decomposition.
Understandable.
Despite its simplicity it tends (in many cases)
to outperform complicated methods like Decision
Trees or Neural Networks.

41
Simple Example(Illustrating the concepts)

A Training set containing 20 examples created
according to

Uniformly Distributed.
No noise.
No irrelevant/redundant attributes.

42
(No Transcript)
43
Optimal Decision Tree
Minimal optimal tree non unique solution
Classification Accuracy 100
44

Actual Decision Tree Generated by C4.5
Classification Accuracy 93.75
45
Two Decision Trees Generated Using Feature
Decomposition
Classification Accuracy 100
46
Naive Bayes in Feature Decomposition
Terminology"
Classification Accuracy 68.75
47
When Feature Set Decomposition With Naïve Bayes
Combination is Preferable Theoretical Cases

Conditionally Independence
DNF
Additive

48
Algorithms for Feature Set Decomposition

Greedy Space Searching Methods
Serial Search
Multi-Search
Caching mechanism

49
Computational Complexity
50
Classifier Representation Oblivious Decision
Tree
51
Performance Evaluation Methods

Wrapper Approach
Conditional Entropy (Maximum Likelihood)
VC-Dimension

52
Generalization Error using VC Dimension
53
Definition of VC-Dimension

The VC-Dimension of a hypothesis space H is size
of the largest set of points that are shattered
by that hypothesis space .
A set of points S are shattered by an hypothesis
space H if and only if for every dichotomy of S
there exists some hypothesis in H consistent with
this dichotomy.

54
Theorem VC Dimension Bounds For Decomposed
Oblivious Decision Trees
55
Proof
56
Corollary 1 for Naïve Bayes
57
Corollary 2 Generalization Error Bound with
Conditional Independent
58
DNF Results
59
INDEP Results
60
Structural Similarity Measure
61
Experimental Results Comparing to Single
Classifiers Methods
62
Experimental Result Comparing To Ensemble Methods
63
Accuracy-Complexity Tradeoff
64
Accuracy-Complexity Tradeoff
65
Comparing To Ensemble Mehtods with Equivalent
Complexities
66
When Feature Set Decomposition Is Beneficial?
Results are statistically significant according
to the Kruskal-Wallis one-way analysis of
variance by ranks .
67
Benchmark For Datasets with Nodes/Sample Ratio0.2
68
Result

Comparing to single classifier methods the
proposed algorithm is never worst and about 80
better.
Comparing to ensemble methods, the proposed
algorithm is usually better.

69
Agenda

Introduction and Motivation.
The Elementary Decomposition Methodology.
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure
Meta-Decomposer.
Conclusions and Future Work.

70
Improving Manufacturing Quality

The goal is to find the relation between the
quality measure (target attribute) and the input
attributes (the manufacturing process data).
The quality measure is usually presented as a
binary attribute (Passed"/"Not Passed")
The input features include the characteristics
of
production line (machine has been tuned, etc.)
Raw materials
Environment (temperature, etc)
Human Resources

71
Task Special Characteristics

Skewed Distribution
Many Relevant Features
Small Datasets

72
Precision and Recall
Classified
Actual
73
The F-Measure
74
F-Measure Splitting Criterion
75
Simple Example For F-Measure Splitting Criterion
76
The F-Measure Evaluation Criterion for Feature
Set Decomposition
77
Illustrative Results
78
Agenda

Introduction and Motivation.
The Elementary Decomposition Methodology.
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure.
Meta-Decomposer.
Conclusions and Future Work.

79
Meta-Decomposer

Based on datasets characteristics, the
meta-decomposer decides whether to decompose the
problem or not and what elementary decomposition
to use.
The idea is to train an inducer on previous
results resulting a meta-classifier which
predicts which decomposition method would perform
well.

80
Meta-Data Generation Phase
81
Meta Induction Phase
82
Meta-Decomposer Usage
83
Meta-Decomposer Results
Improvement 6.8
Improvement 7.1
84
Agenda

Introduction and Motivation
The Elementary Decomposition Methodology.
Feature Set Decomposition.
Mining Manufacturing Data With F-Measure.
Meta-Decomposer.
Conclusions and Future Work

85
Conclusions

Feature Set Decomposition improve decision trees
accuracies when there are many contributing
features relative to the training set size
without increasing model increasing Model
Complexity.
Meta-Decomposer is capable to chose the best
elementary decomposition for a given problem.
The Feature Set Decomposition F-Measure
evaluation criterion can be useful for
classification problems in quality assurance

86
Limitations

The algorithmic framework has no backtracking
capabilities (for instance removing a single
feature from a subset or removing an entire
subset).
The search currently begins from an empty
decomposition structure, which may be the reason
why the number of features in each subset is
relatively small.
The proposed F-Measure evaluation criterion
refers to binary target attributes.

87
Future Research

Recursively decompose a classification task using
elementary decomposition methods.
How can we utilize prior knowledge for improving
decomposing methodology?
Examining how the Feature Set Decomposition
concept can be implemented using other inducers
like Support Vectors Machines.
A further theoretical investigation is required
in order to better understand under what
circumstances the feature set decomposition
methodology is appropriate and its relation to
other decomposition paradigms.
Extending the meta-learning schema to investigate
other decomposition methods, more precisely
Function Decomposition and Concept Aggregation.
Checking whether the meta-learning results are
still valid when different decomposition
algorithms implementations are used.