Project EMDMLR Decision Tree Classifiers Part 1 - PowerPoint PPT Presentation

1 / 72

About This Presentation

Title:

Project EMDMLR Decision Tree Classifiers Part 1

Description:

These pruned versions of the largest tree have ... We find the pruned tree (BT-9, or BT-7, or BT-4, or BT-3, or BT-1) whose ... competition amongst pruned trees ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 73

Provided by: mlCec

Category:

more less

Transcript and Presenter's Notes

Title: Project EMDMLR Decision Tree Classifiers Part 1

1
Project EMD-MLR Decision Tree Classifiers (Part
1)

UCF
October 22, 2004

2
Presentation Outline

Introduction to Pattern Recognition
Introduction to the Decision Tree Classifier
Important Tree Functions
Growing Phase
Pruning Phase
Classify Phase
Growing Phase
Split Criteria
Stopping Criteria
Leaf Node Assignment

3
Presentation Outline

Pruning Phase
How the Pruning Phase Works
Classify Phase
Data with Categorical Attributes
Data with non-uniform misclassification costs
Computational Complexity of the Decision Tree
Classifier

4
Decision Tree Classifier Pruning PhaseHow do
we Prune a Tree?

The tree that we normally grow is of larger size
than needed
The purpose of the growing phase is to build the
largest tree possible, with the smallest
misclassification error on the training set
The purpose of the pruning phase is to produce a
number (quite often more than one) pruned
versions of the tree
These pruned versions of the largest tree have
Smaller misclassification error (at least some of
them) than the largest tree on unseen data
Are easier to interpret than the largest tree is

5
Decision Tree Classifier Pruning PhaseHow do
we Prune a Tree

To prune the tree that was built in the growing
phase we define a measure of tree goodness
This measure depends on the misclassification
error of the tree and on the size of the tree
Smaller tree misclassification error makes this
measure smaller
Smaller tree size makes this measure smaller
The measure is defined as follows
where is a non-negative constant ,
, is the trees misclassification error and
is the number of leaves in tree

6
Decision Tree Classifier Pruning PhaseThe Big
Tree
7
Decision Tree Classifier Pruning PhaseThe Big
Tree (BT)

The misclassification error of the big tree is
equal to 0
The number of leaves in
the big tree are equal
to 6

8
Decision Tree Classifier Pruning PhaseThe BTs
Misclassification Cost
9
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 9 (BT-9)

The misclassification error of the big tree minus
the branches of node 9 is equal to 1/10
The number of leaves in the big tree minus the
branches of node 9 are equal to 5

10
Decision Tree Classifier Pruning PhaseThe
(BT-9)s Misclassification Cost
11
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-9)
12
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 7 (BT-7)

The misclassification error of the big tree minus
branches of node 7 is equal to 1/10
The number of leaves in the big tree minus the
branches of node 4 are equal to 4

13
Decision Tree Classifier Pruning PhaseThe
(BT-7)s Misclassification Cost
14
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-7)
15
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 4 (BT-4)

The misclassification error of the big tree minus
branches of node 4 is equal to 2/10
The number of leaves in the big tree minus the
branches of node 4 are equal to 3

16
Decision Tree Classifier Pruning PhaseThe
(BT-4)s Misclassification Cost
17
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-4)
18
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 3 (BT-3)

The misclassification error of the big tree minus
branches of node 3 is equal to 2/10
The number of leaves in the big tree minus the
branches of node 3 are equal to 2

19
Decision Tree Pruning PhaseThe (BT-3)s
Misclassification Cost
20
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-3)
21
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 1 (BT-1)

The misclassification error of the big tree minus
branches of node 3 is equal to 5/10
The number of leaves in the big tree minus the
branches of node 1 are equal to 1

22
Decision Tree Pruning PhaseThe (BT-1)s
Misclassification Cost
23
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-1)
24
Decision Tree Classifier Pruning PhaseWhich
nodes branches do we prune?

We find the pruned tree (BT-9, or BT-7, or BT-4,
or BT-3, or BT-1) whose misclassification cost
becomes smaller first, as compared to the
misclassification cost of the big tree (as
increases)
The pruned tree for which this event happens
first is BT-3 (see next slide)

25
Decision Tree Classifier Pruning
PhaseComparisons MC(BT), MC(BT-9), MC(BT-7),
MC(BT-4), MC(BT-3), MC(BT-1)
26
Decision Tree Classifier Pruning PhaseThe
Chosen Pruned Tree BT-3
27
Decision Tree Classifier Pruning PhaseWhat
Next?

Pruning continues along the same lines
But now we will apply additional pruning to the
already pruned big tree
That is, we will apply additional pruning to the
tree BT-3

28
Decision Tree Classifier Pruning PhaseThe Tree
BT-3
29
Decision Tree Classifier Pruning PhaseBT-3
minus branches of node 1 (BT-3-1)

The misclassification error of the BT-3 minus
branches of node 1 is equal to 5/10
The number of leaves in the BT-3 tree minus the
branches of node 1 are equal to 1

30
Decision Tree Classifier Pruning PhaseThe
(BT-3-1)s Misclassification Cost
31
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT-3) and MC(BT-3-1)
32
Decision Tree Classifier Pruning PhaseWhich
nodes branches do we prune?

We find the pruned tree (BT-3-1) whose
misclassification cost becomes smaller than the
misclassification cost of the BT-3 first (as
increases)
Here we have no competition amongst pruned trees
So, at some appropriate value, this will
happen

33
Decision Tree Classifier Pruning PhaseThe Tree
BT-3-1
34
Decision Tree Classifier Pruning PhaseWhat
Next?

There is no more pruning that we can apply.
Since, eventually through pruning, we ended up
with a tree consisting only of the root of the
tree
So we designate the pruning process complete
The pruning process discovered two pruned trees
These trees are the trees BT-3 and BT-3-1

35
Decision Tree Classifier Pruning PhaseThe
Pruned Trees BT-3 and BT-3-1
1
1,2,3,4,5 A6,7,8,9,10 - B
36
Decision Tree Classifier Classify Phase

We have already gone through the growing and the
pruning phase of the decision tree
We have discovered that three trees were worth
storing for further consideration
These are
The Big Tree (BT)
The Big Tree minus the branches of node 3 (BT-3)
The Big Tree minus the branches of node 1
(BT-3-1) root node only
We are now ready to examine the performance of
two trees (BT and BT-3) on unseen data
The performance of the BT-3-1 tree on unseen data
will not be examined

37
Decision Tree Classifier Classify
PhasePerformance of BT on New Data 147/160
38
Decision Tree Classifier Classify
PhasePerformance of BT on New Data 12/160
39
Decision Tree Classifier Classify
PhasePerformance of BT-3 on New Data 138/160
40
Decision Tree Classifier Classify
PhasePerformance of BT-3 on New Data 22/160
41
Decision Tree Classifier Classify PhaseWhich
one is the Best Tree?

Out of the available possibilities (the unpruned
tree big tree) and its pruned versions we choose
The tree that has the smallest, or close to the
smallest, classification error on new data, and
It has the smallest size (number of leaves)
For instance, in the previous example
We could choose the pruned tree as our preferred
tree because it has reasonable performance on the
test set and fewer leaves
Sometimes the choice is not that obvious

42
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
43
Decision Tree Classifier Special
FeaturesNon-uniform misclassification costs

In this example, the misclassification cost of
mistaking A for B is 1
In this example, the misclassification cost of
mistaking B for A is equal to 2
Hence, it is twice as expensive to predict A,
while B is true, than the other way around
There are no costs associated with making the
correct prediction

A
B
P\T
A
B
T ? True class P ? Predicted class
44
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
45
Decision Trees Special FeaturesTree grown from
data with non-uniform costs

The growing phase of the tree works in a similar
fashion as if we had all misclassification costs
equal
But now, the tree operates as if the class B data
have twice as much weight as the class A data
The figure in the next page shows the fully grown
tree

46
Decision Tree Classifier Special FeaturesThe
big tree grown from data with non-uniform costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
47
Decision Tree Classifier Special FeaturesThe
big tree grown from data with uniform costs
Misclassification cost for class B is the same as
the misclassification cost for class A
48
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
non-uniform costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
49
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
uniform costs
Misclassification cost for class B is the same as
the misclassification cost for class A
50
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)

It turns out, that for this particular case of
the grown and pruned trees, there are no major
differences between the cases of uniform and
non-uniform costs (when it is twice as costly to
make mistakes in correctly classifying class B
compared to mistakes in correctly classifying
class A)
But, there are differences in the estimates of
misclassification errors for the trees grown for
uniform costs versus the trees grown for
non-uniform costs
For instance, observe in the following figure,
the misclassification error of the right child of
the root node of the tree for non-uniform and
uniform costs

51
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
Pruned tree for non-uniform cost
Pruned tree for uniform cost
Misclassification error of right child Is equal
to 2/10
Misclassification error of right child Is equal
to 2/15
52
Decision Tree Classifier Special
FeaturesNon-uniform misclassification costs

In this example, the misclassification cost of
mistaking B for A is 1
In this example, the misclassification cost of
mistaking A for B is equal to 2
Hence, it is twice as expensive to predict B,
while A is true, than the other way around
There are no costs associated with making the
correct prediction

A
B
P/T
A
B
T ? True class P ? Predicted class
53
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
54
Decision Trees Special FeaturesTree grown from
data with non-uniform costs

The growing phase of the tree works in a similar
fashion as if we had all misclassification costs
equal
But now, the tree operates as if the class A data
have twice as much weight as the class B data
The figure in the next page shows the fully grown
tree

55
Decision Tree Classifier Special FeaturesThe
big tree grown from data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
56
Decision Tree Classifier Special FeaturesThe
big tree grown from data with uniform costs
Misclassification cost for class A is the same as
the misclassification cost for class B
57
Decision Tree Classifier Special Features1st
Pruned tree produced data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
58
Decision Tree Classifier Special Features2nd
Pruned tree produced data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
59
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
uniform costs
Misclassification cost for class A is the same as
the misclassification cost for class B
60
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)

It turns out that for this particular case of
non-uniform costs the grown and pruned trees are
different than the ones grown and pruned for the
case of uniform costs
Furthermore, in this case of non-uniform costs,
there are differences in the estimates of
misclassification errors for the trees grown and
pruned compared to the trees grown and pruned for
the case of uniform costs
For instance, observe in the following figure,
the misclassification error of the right child of
the root node for non-uniform and uniform costs

61
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
Pruned tree for non-uniform cost
Pruned tree for uniform cost
Misclassification error of right child Is equal
to 2/10
Misclassification error of left child Is equal
to 2/15
62
Decision Tree ClassifierComputational Complexity

We assume that a training file is given to us for
the training of the classifier (e.g., iris_train)
N number of data-points (rows) of our training
set
d number of input attributes (columns) of our
training set
J number of distinct classes that the data could
belong to. This corresponds to the number of
distinct elements of the last column of the
training set (e.g., 1 or 2 or 3 for the
iris_train)

63
Decision Tree ClassifierComputational Complexity

Let us first find the complexity involved in
checking the splits associated with the first
attribute of our training set
Let us also assume that the first attribute is a
numerical attribute
First, we sort the N data-points in our training
set with respect to the first attribute. This
step requires
operations
Secondly, we are finding the number of possible
splits with respect to this attribute. For each
one of these possible splits we are finding the
corresponding gain in information, if each and
every one of these splits is applied to the data.
The complexity of this step is proportional to

64
Decision Tree ClassifierComputational Complexity

In review, the complexity of steps 1 and 2 is
proportional to
We have to apply steps 1 and 2 d times, to
account for every attribute in our training set.
Thus, the complexity of performing the first
split with the decision tree classifier is

65
Decision Tree ClassifierComputational Complexity

We need to continue reapplying Steps 1 and 2
until the tree reaches the point where no node
can be split any more (it happens when either a
node has one data-point or a node is pure)
The complexity of reapplying these steps (if all
the attributes are numerical), until the tree
cannot grow any more, is proportional to (in the
best case)

66
Decision Tree ClassifierComputational Complexity

The complexity of reapplying these steps (if all
the attributes are numerical), until the tree
cannot grow any more, is proportional to (in the
worst case)

67
Decision Tree ClassifierComputational Complexity

Example of the Decision Tree Classifiers
Complexity
N1000
d10
All attributes are numerical
Best Case Complexity is proportional to
Worst Time Complexity is proportional to

68
Decision Tree ClassifierComputational Complexity

Example of the Decision Tree Classifiers
Complexity
N10000
d50
All attributes are numerical
Best Case Complexity is proportional to
Worst Time Complexity is proportional to

69
Decision Tree ClassifierComputational Complexity

What happens if one or more of the input
attributes are categorical attributes?
For an attribute that is categorical with L
possible distinct values we have to check either
a number of splits equal to (if we do complete
enumeration)
a number of splits approximately equal to (if we
use the IND criterion)

70
Decision Tree ClassifierComputational Complexity

If the number of splits ( or )
that we have to check for each categorical
attribute do not exceed the number of splits (N)
that we have to check for a numerical attribute
then the previous complexity formulas
are still valid.
The first formula above is the best scenario
case, while the second formula above is the worst
scenario case

71
Decision Tree ClassifierComputational Complexity

Consider an example where we have one categorical
attribute with L11, distinct values. Then (for a
complete enumeration of splits),
and as a result, the example with N1000, d10
(of which 9 are numerical attributes and the 10th
attribute is a categorical attribute) has
computational complexity proportional to a number
in the interval

72
Decision Tree ClassifierComputational Complexity

Consider an example where we have one categorical
attribute with L14, distinct values. Then (for
complete enumeration of splits),
and as a result, the example with N10000, d50
(of which 49 are numerical attributes and the
50th attribute is a categorical attribute) has
computational complexity proportional to a number
in the interval

Notice, how a small increase in the number of
distinct values of the categorical attribute
causes a significant increase in the number of
split values that need to be examined

Write a Comment

User Comments (0)