Project EMDMLR Decision Tree Classifiers Part 1 - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Project EMDMLR Decision Tree Classifiers Part 1

Description:

These pruned versions of the largest tree have ... We find the pruned tree (BT-9, or BT-7, or BT-4, or BT-3, or BT-1) whose ... competition amongst pruned trees ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 73
Provided by: mlCec
Category:

less

Transcript and Presenter's Notes

Title: Project EMDMLR Decision Tree Classifiers Part 1


1
Project EMD-MLR Decision Tree Classifiers (Part
1)
  • UCF
  • October 22, 2004

2
Presentation Outline
  • Introduction to Pattern Recognition
  • Introduction to the Decision Tree Classifier
  • Important Tree Functions
  • Growing Phase
  • Pruning Phase
  • Classify Phase
  • Growing Phase
  • Split Criteria
  • Stopping Criteria
  • Leaf Node Assignment

3
Presentation Outline
  • Pruning Phase
  • How the Pruning Phase Works
  • Classify Phase
  • Data with Categorical Attributes
  • Data with non-uniform misclassification costs
  • Computational Complexity of the Decision Tree
    Classifier

4
Decision Tree Classifier Pruning PhaseHow do
we Prune a Tree?
  • The tree that we normally grow is of larger size
    than needed
  • The purpose of the growing phase is to build the
    largest tree possible, with the smallest
    misclassification error on the training set
  • The purpose of the pruning phase is to produce a
    number (quite often more than one) pruned
    versions of the tree
  • These pruned versions of the largest tree have
  • Smaller misclassification error (at least some of
    them) than the largest tree on unseen data
  • Are easier to interpret than the largest tree is

5
Decision Tree Classifier Pruning PhaseHow do
we Prune a Tree
  • To prune the tree that was built in the growing
    phase we define a measure of tree goodness
  • This measure depends on the misclassification
    error of the tree and on the size of the tree
  • Smaller tree misclassification error makes this
    measure smaller
  • Smaller tree size makes this measure smaller
  • The measure is defined as follows
  • where is a non-negative constant ,
    , is the trees misclassification error and
    is the number of leaves in tree

6
Decision Tree Classifier Pruning PhaseThe Big
Tree
7
Decision Tree Classifier Pruning PhaseThe Big
Tree (BT)
  • The misclassification error of the big tree is
    equal to 0
  • The number of leaves in
  • the big tree are equal
  • to 6

8
Decision Tree Classifier Pruning PhaseThe BTs
Misclassification Cost
9
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 9 (BT-9)
  • The misclassification error of the big tree minus
    the branches of node 9 is equal to 1/10
  • The number of leaves in the big tree minus the
    branches of node 9 are equal to 5

10
Decision Tree Classifier Pruning PhaseThe
(BT-9)s Misclassification Cost
11
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-9)
12
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 7 (BT-7)
  • The misclassification error of the big tree minus
    branches of node 7 is equal to 1/10
  • The number of leaves in the big tree minus the
    branches of node 4 are equal to 4

13
Decision Tree Classifier Pruning PhaseThe
(BT-7)s Misclassification Cost
14
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-7)
15
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 4 (BT-4)
  • The misclassification error of the big tree minus
    branches of node 4 is equal to 2/10
  • The number of leaves in the big tree minus the
    branches of node 4 are equal to 3

16
Decision Tree Classifier Pruning PhaseThe
(BT-4)s Misclassification Cost
17
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-4)
18
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 3 (BT-3)
  • The misclassification error of the big tree minus
    branches of node 3 is equal to 2/10
  • The number of leaves in the big tree minus the
    branches of node 3 are equal to 2

19
Decision Tree Pruning PhaseThe (BT-3)s
Misclassification Cost
20
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-3)
21
Decision Tree Classifier Pruning PhaseBig Tree
minus branches of node 1 (BT-1)
  • The misclassification error of the big tree minus
    branches of node 3 is equal to 5/10
  • The number of leaves in the big tree minus the
    branches of node 1 are equal to 1

22
Decision Tree Pruning PhaseThe (BT-1)s
Misclassification Cost
23
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT) and MC(BT-1)
24
Decision Tree Classifier Pruning PhaseWhich
nodes branches do we prune?
  • We find the pruned tree (BT-9, or BT-7, or BT-4,
    or BT-3, or BT-1) whose misclassification cost
    becomes smaller first, as compared to the
    misclassification cost of the big tree (as
    increases)
  • The pruned tree for which this event happens
    first is BT-3 (see next slide)

25
Decision Tree Classifier Pruning
PhaseComparisons MC(BT), MC(BT-9), MC(BT-7),
MC(BT-4), MC(BT-3), MC(BT-1)
26
Decision Tree Classifier Pruning PhaseThe
Chosen Pruned Tree BT-3
27
Decision Tree Classifier Pruning PhaseWhat
Next?
  • Pruning continues along the same lines
  • But now we will apply additional pruning to the
    already pruned big tree
  • That is, we will apply additional pruning to the
    tree BT-3

28
Decision Tree Classifier Pruning PhaseThe Tree
BT-3
29
Decision Tree Classifier Pruning PhaseBT-3
minus branches of node 1 (BT-3-1)
  • The misclassification error of the BT-3 minus
    branches of node 1 is equal to 5/10
  • The number of leaves in the BT-3 tree minus the
    branches of node 1 are equal to 1

30
Decision Tree Classifier Pruning PhaseThe
(BT-3-1)s Misclassification Cost
31
Decision Tree Classifier Pruning
PhaseComparisons of MC(BT-3) and MC(BT-3-1)
32
Decision Tree Classifier Pruning PhaseWhich
nodes branches do we prune?
  • We find the pruned tree (BT-3-1) whose
    misclassification cost becomes smaller than the
    misclassification cost of the BT-3 first (as
    increases)
  • Here we have no competition amongst pruned trees
  • So, at some appropriate value, this will
    happen

33
Decision Tree Classifier Pruning PhaseThe Tree
BT-3-1
34
Decision Tree Classifier Pruning PhaseWhat
Next?
  • There is no more pruning that we can apply.
    Since, eventually through pruning, we ended up
    with a tree consisting only of the root of the
    tree
  • So we designate the pruning process complete
  • The pruning process discovered two pruned trees
  • These trees are the trees BT-3 and BT-3-1

35
Decision Tree Classifier Pruning PhaseThe
Pruned Trees BT-3 and BT-3-1
1
1,2,3,4,5 A6,7,8,9,10 - B
36
Decision Tree Classifier Classify Phase
  • We have already gone through the growing and the
    pruning phase of the decision tree
  • We have discovered that three trees were worth
    storing for further consideration
  • These are
  • The Big Tree (BT)
  • The Big Tree minus the branches of node 3 (BT-3)
  • The Big Tree minus the branches of node 1
    (BT-3-1) root node only
  • We are now ready to examine the performance of
    two trees (BT and BT-3) on unseen data
  • The performance of the BT-3-1 tree on unseen data
    will not be examined

37
Decision Tree Classifier Classify
PhasePerformance of BT on New Data 147/160
38
Decision Tree Classifier Classify
PhasePerformance of BT on New Data 12/160
39
Decision Tree Classifier Classify
PhasePerformance of BT-3 on New Data 138/160
40
Decision Tree Classifier Classify
PhasePerformance of BT-3 on New Data 22/160
41
Decision Tree Classifier Classify PhaseWhich
one is the Best Tree?
  • Out of the available possibilities (the unpruned
    tree big tree) and its pruned versions we choose
  • The tree that has the smallest, or close to the
    smallest, classification error on new data, and
  • It has the smallest size (number of leaves)
  • For instance, in the previous example
  • We could choose the pruned tree as our preferred
    tree because it has reasonable performance on the
    test set and fewer leaves
  • Sometimes the choice is not that obvious

42
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
43
Decision Tree Classifier Special
FeaturesNon-uniform misclassification costs
  • In this example, the misclassification cost of
    mistaking A for B is 1
  • In this example, the misclassification cost of
    mistaking B for A is equal to 2
  • Hence, it is twice as expensive to predict A,
    while B is true, than the other way around
  • There are no costs associated with making the
    correct prediction

A
B
P\T
A
B
T ? True class P ? Predicted class
44
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
45
Decision Trees Special FeaturesTree grown from
data with non-uniform costs
  • The growing phase of the tree works in a similar
    fashion as if we had all misclassification costs
    equal
  • But now, the tree operates as if the class B data
    have twice as much weight as the class A data
  • The figure in the next page shows the fully grown
    tree

46
Decision Tree Classifier Special FeaturesThe
big tree grown from data with non-uniform costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
47
Decision Tree Classifier Special FeaturesThe
big tree grown from data with uniform costs
Misclassification cost for class B is the same as
the misclassification cost for class A
48
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
non-uniform costs
Misclassification cost for class B is twice as
big as the misclassification cost for class A
49
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
uniform costs
Misclassification cost for class B is the same as
the misclassification cost for class A
50
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
  • It turns out, that for this particular case of
    the grown and pruned trees, there are no major
    differences between the cases of uniform and
    non-uniform costs (when it is twice as costly to
    make mistakes in correctly classifying class B
    compared to mistakes in correctly classifying
    class A)
  • But, there are differences in the estimates of
    misclassification errors for the trees grown for
    uniform costs versus the trees grown for
    non-uniform costs
  • For instance, observe in the following figure,
    the misclassification error of the right child of
    the root node of the tree for non-uniform and
    uniform costs

51
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
Pruned tree for non-uniform cost
Pruned tree for uniform cost
Misclassification error of right child Is equal
to 2/10
Misclassification error of right child Is equal
to 2/15
52
Decision Tree Classifier Special
FeaturesNon-uniform misclassification costs
  • In this example, the misclassification cost of
    mistaking B for A is 1
  • In this example, the misclassification cost of
    mistaking A for B is equal to 2
  • Hence, it is twice as expensive to predict B,
    while A is true, than the other way around
  • There are no costs associated with making the
    correct prediction

A
B
P/T
A
B
T ? True class P ? Predicted class
53
Decision Tree Classifier Special FeaturesData
with non-uniform misclassification costs
54
Decision Trees Special FeaturesTree grown from
data with non-uniform costs
  • The growing phase of the tree works in a similar
    fashion as if we had all misclassification costs
    equal
  • But now, the tree operates as if the class A data
    have twice as much weight as the class B data
  • The figure in the next page shows the fully grown
    tree

55
Decision Tree Classifier Special FeaturesThe
big tree grown from data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
56
Decision Tree Classifier Special FeaturesThe
big tree grown from data with uniform costs
Misclassification cost for class A is the same as
the misclassification cost for class B
57
Decision Tree Classifier Special Features1st
Pruned tree produced data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
58
Decision Tree Classifier Special Features2nd
Pruned tree produced data with non-uniform costs
Misclassification cost for class A is twice as
big as the misclassification cost for class B
59
Decision Tree Classifier Special
FeaturesPruned tree produced from data with
uniform costs
Misclassification cost for class A is the same as
the misclassification cost for class B
60
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
  • It turns out that for this particular case of
    non-uniform costs the grown and pruned trees are
    different than the ones grown and pruned for the
    case of uniform costs
  • Furthermore, in this case of non-uniform costs,
    there are differences in the estimates of
    misclassification errors for the trees grown and
    pruned compared to the trees grown and pruned for
    the case of uniform costs
  • For instance, observe in the following figure,
    the misclassification error of the right child of
    the root node for non-uniform and uniform costs

61
Decision Tree Classifier Special
FeaturesDifferences (uniform vs. non-uniform
costs)
Pruned tree for non-uniform cost
Pruned tree for uniform cost
Misclassification error of right child Is equal
to 2/10
Misclassification error of left child Is equal
to 2/15
62
Decision Tree ClassifierComputational Complexity
  • We assume that a training file is given to us for
    the training of the classifier (e.g., iris_train)
  • N number of data-points (rows) of our training
    set
  • d number of input attributes (columns) of our
    training set
  • J number of distinct classes that the data could
    belong to. This corresponds to the number of
    distinct elements of the last column of the
    training set (e.g., 1 or 2 or 3 for the
    iris_train)

63
Decision Tree ClassifierComputational Complexity
  • Let us first find the complexity involved in
    checking the splits associated with the first
    attribute of our training set
  • Let us also assume that the first attribute is a
    numerical attribute
  • First, we sort the N data-points in our training
    set with respect to the first attribute. This
    step requires
  • operations
  • Secondly, we are finding the number of possible
    splits with respect to this attribute. For each
    one of these possible splits we are finding the
    corresponding gain in information, if each and
    every one of these splits is applied to the data.
    The complexity of this step is proportional to

64
Decision Tree ClassifierComputational Complexity
  • In review, the complexity of steps 1 and 2 is
    proportional to
  • We have to apply steps 1 and 2 d times, to
    account for every attribute in our training set.
    Thus, the complexity of performing the first
    split with the decision tree classifier is

65
Decision Tree ClassifierComputational Complexity
  • We need to continue reapplying Steps 1 and 2
    until the tree reaches the point where no node
    can be split any more (it happens when either a
    node has one data-point or a node is pure)
  • The complexity of reapplying these steps (if all
    the attributes are numerical), until the tree
    cannot grow any more, is proportional to (in the
    best case)

66
Decision Tree ClassifierComputational Complexity
  • The complexity of reapplying these steps (if all
    the attributes are numerical), until the tree
    cannot grow any more, is proportional to (in the
    worst case)

67
Decision Tree ClassifierComputational Complexity
  • Example of the Decision Tree Classifiers
    Complexity
  • N1000
  • d10
  • All attributes are numerical
  • Best Case Complexity is proportional to
  • Worst Time Complexity is proportional to

68
Decision Tree ClassifierComputational Complexity
  • Example of the Decision Tree Classifiers
    Complexity
  • N10000
  • d50
  • All attributes are numerical
  • Best Case Complexity is proportional to
  • Worst Time Complexity is proportional to

69
Decision Tree ClassifierComputational Complexity
  • What happens if one or more of the input
    attributes are categorical attributes?
  • For an attribute that is categorical with L
    possible distinct values we have to check either
  • a number of splits equal to (if we do complete
    enumeration)
  • a number of splits approximately equal to (if we
    use the IND criterion)

70
Decision Tree ClassifierComputational Complexity
  • If the number of splits ( or )
    that we have to check for each categorical
    attribute do not exceed the number of splits (N)
    that we have to check for a numerical attribute
    then the previous complexity formulas
  • are still valid.
  • The first formula above is the best scenario
    case, while the second formula above is the worst
    scenario case

71
Decision Tree ClassifierComputational Complexity
  • Consider an example where we have one categorical
    attribute with L11, distinct values. Then (for a
    complete enumeration of splits),
  • and as a result, the example with N1000, d10
    (of which 9 are numerical attributes and the 10th
    attribute is a categorical attribute) has
    computational complexity proportional to a number
    in the interval

72
Decision Tree ClassifierComputational Complexity
  • Consider an example where we have one categorical
    attribute with L14, distinct values. Then (for
    complete enumeration of splits),
  • and as a result, the example with N10000, d50
    (of which 49 are numerical attributes and the
    50th attribute is a categorical attribute) has
    computational complexity proportional to a number
    in the interval

Notice, how a small increase in the number of
distinct values of the categorical attribute
causes a significant increase in the number of
split values that need to be examined
Write a Comment
User Comments (0)
About PowerShow.com