Project EMDMLR Decision Tree Classifiers Part 1 - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Project EMDMLR Decision Tree Classifiers Part 1

Description:

Introduction to the Decision Tree Classifier. Important Tree ... Sepal Length (cm) (Feature 1) Sepal Width (cm) (Feature 2) Petal Length (cm) (Feature 3) ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 86
Provided by: mlCec
Category:

less

Transcript and Presenter's Notes

Title: Project EMDMLR Decision Tree Classifiers Part 1


1
Project EMD-MLR Decision Tree Classifiers (Part
1)
  • UCF
  • October 15, 2004

2
Presentation Outline
  • Introduction to Pattern Recognition
  • Introduction to the Decision Tree Classifier
  • Important Tree Functions
  • Growing Phase
  • Pruning Phase
  • Classify Phase
  • Growing Phase
  • Split Criteria
  • Stopping Criteria
  • Leaf Node Assignment

3
Presentation Outline
  • Pruning Phase
  • How the Pruning Phase Works
  • Classify Phase
  • Data with Categorical Attributes
  • Data with non-uniform misclassification costs
  • Computational Complexity of the Decision Tree
    Classifier

4
Pattern Recognition
  • The ease with which we recognize a face,
    understand spoken words, read handwritten
    characters, identify our keys in our pocket by
    feel and decide whether an apple is ripe by its
    smell belies the astounding complex processes
    that underlie these acts of pattern recognition
    (Duda and Hart, 2001)

5
Pattern RecognitionDefinition
  • Pattern Recognition the act of taking in raw
    data and taking an action based on the category
    of the pattern -- has been crucial for our
    survival-- and over the years we have tried to
    develop algorithms that duplicate the amazing
    ability of humans to recognize patterns(Duda and
    Hart, 2001)

6
Pattern RecognitionAlgorithms
  • These algorithms are referred to asPattern
    Recognition Algorithms orPattern
    Classification Algorithms
  • An example of a pattern recognition or pattern
    classification algorithm is the decision tree
    algorithm (decision tree classifier)

7
Pattern RecognitionExample
  • To understand the complexity of a pattern
    recognition system let us consider a simple
    example, that of recognizing the type of an Iris
    plant

8
Pattern Recognition A case study Iris Data
  • Iris data consists of 150 data-points of three
    different types of flowers
  • Iris Virginica
  • Iris Setosa
  • Iris Versicolor
  • Analogy Failed/Non-Failed Blades
  • Each datum has four attributes
  • Sepal Length (cm) (Feature 1)
  • Sepal Width (cm) (Feature 2)
  • Petal Length (cm) (Feature 3)
  • Petal Width (cm) (Feature 4)
  • Analogy Operating Hours, Starts, etc.

9
Pattern Recognition Components of a Pattern
Recognition System
Problem Data
Feature Extraction
Feature Selection
Pattern Classification
Pattern Classes
10
Pattern Recognition Feature Extraction,
Selection and Classification
  • The feature extraction module has the purpose of
    extracting (or collecting) some important
    information for the task at hand
  • The feature selection module has the purpose of
    extracting the features that are important to
    achieve the objective of interest
  • The classifier module has the purpose of
    classifying the data relying on the information
    conveyed by the features selected

11
Pattern Recognition Feature Extraction Iris
Data
  • In our case the features have already been
    extracted from the data and they are
  • Sepal Length (Feature 1)
  • Sepal Width (Feature 2)
  • Petal Length (Feature 3)
  • Petal Width (Feature 4)
  • Analogy Features already extracted for the blade
    data include Operating Hours (OH), Various Types
    of Trips (TR), etc.

12
Pattern Recognition Feature Selection Iris Data
  • In this case we are trying to determine which
    features are the most important in recognizing
    (classifying) the type of the iris plant
  • Colored Scatter plots (2-D plots) of 2 features
    at a time might be useful (see next slide)
  • Analogy Scatter Plots of Blade Feature Data,
    such as Operating Hours (OH), Trips (TR), Fired
    Aborts (FA)

13
Pattern Recognition Iris Data Feature Selection
14
Pattern Recognition Histogram of the Petal
Length Feature
15
Pattern Recognition Histogram of the Petal Width
Feature
16
Pattern Recognition Simple Classifier Model
(Model 1)
17
Pattern Recognition Simple Classifier Model
(Model 2)
18
Pattern Recognition More Complex Classifier
Model (Model 3)
19
Pattern RecognitionPerformance of Model 1
Separating Planes for testing data
20
Pattern Recognition Performance of Model 2
21
Pattern Recognition Performance of Model 3
22
Pattern Recognition Selection of a Classifier
Model
  • A classifier model is normally selected based on
    the following measures of goodness
  • Performance of the classifier model on previously
    unseen data
  • Simplicity of the classifier
  • Other measures of goodness might be of interest
    to the designer, such as
  • Computational Complexity of the Classifier
  • Robustness of the classifier in the presence of
    noise

23
Pattern Recognition SystemSelection of a
Classifier Model
  • An example of a classifier model is a classifier
    model called
  • Decision Tree Classifier

24
Decision Tree Classifier General Overview
  • The method for constructing a decision tree
    classifier from a collection of data is easy to
    understand
  • Data consist of data attributes (e.g., operating
    hours, number of starts) and the class label
    (scrapped versus non-scrapped blade).
  • Initially all the data belong to the same set,
    located at the root of the tree
  • Then, a data attribute is chosen, and a test on
    this attribute, to split the data into smaller
    subsets of higher percentages of one-class
    labels, is employed

Decision Trees help you understand what type of
data attributes and attribute values lead to
certain class labels
25
Decision Tree ClassifierGraphic Representation
of a Tree Classifier
Node 0 Root node Nodes 1,2 children of node
0 Node 0 parent of nodes 1 2 Nodes 1,3,4
leaves of the tree
0
Branch of the Tree
Node of the Tree
2
1
4
3
26
Decision Tree ClassifierOperational Phases of
the Tree Classifier
  • The decision tree has three distinct but
    interrelated phases. These are
  • Growing Phase
  • Pruning Phase
  • Test (Classify/Performance) Phase

27
Decision Tree ClassifierGrowing Phase of the
Decision Tree Classifier
0
2
1
4
3
28
Decision Tree ClassifierPruning Phase of the
Decision Tree Classifier
0
2
1
4
3
29
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier
  • It requires easy to understand elements for its
    design, such as
  • A set of questions Q
  • A rule for selecting the best split at any node
  • A criterion for choosing the right size tree
  • It can be applied to any data structure through
    the appropriate formulation of the set of
    questions Q

30
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier
  • It handles both ordered and categorical
    variables
  • Ordered Variable A variable assuming values from
    the set1, 2, 3, 4, 5, 6,
  • Categorical Variable A variable assuming values
    from the set green, red, blue, orange,
  • The final classification has a simple form which
    can be compactly stored to efficiently classify
    new data

31
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier
  • It does automatic stepwise variable selection and
    complexity reduction
  • It gives, with no additional effort, not only a
    classification, but also an estimate of the
    misclassification probability for the object

32
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier
  • It is invariant under all monotone
    transformations of the individual ordered
    variables
  • E.g.After multiplying a specific feature by a
    constant (say, change its measurement units), the
    resulting decision tree remains unchanged
  • It is extremely robust to outliers and
    misclassified points in the training set, used
    for the trees design

33
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier
  • The tree procedure gives easily understood and
    interpreted information regarding the predictive
    structure of the data
  • Given a decision tree, you can extract simple
    IF-THEN rules, that show you how the thought
    process of the tree, when it classifies.
  • It has been used successfully in a variety of
    applications (see following slides)

34
Decision Tree ClassifierApplications of a
Decision Tree Classifier
  • Medical Applications
  • Wisconsin Breast Cancer (predict whether a tissue
    sample taken from a patient is malignant or
    benign two classes, nine numerical attributes)
  • Bupa Livers Disorder (predict whether or not a
    male patient has a liver disorder based on blood
    tests and alcohol consumption two classes, six
    numerical attributes)

35
Decision Tree ClassifierApplications of a
Decision Tree Classifier
  • Medical Applications
  • PIMA Indian Diabetes (the patients are females at
    least 21 years old of Pima Indian heritage living
    near Phoenix, Arizona the problem is to predict
    whether a patient would test positive for
    diabetes there are two classes, seven numerical
    attributes)
  • Heart Disease (the problem here is to predict the
    presence or absence of heart disease based on
    various medical tests there are two classes,
    seven numerical attributes and six categorical
    attributes)

36
Decision Tree ClassifierApplications of a
Decision Tree Classifier
  • Image Recognition Applications
  • Satellite Image (this dataset gives the
    multi-spectral values of pixels within 3x3
    neighborhoods in a satellite image, and the
    classification associated with the central pixel
    the aim is to predict the classification given
    the multi-spectral values. There are six classes
    and thirty six numerical attributes)
  • Image Segmentation (this is a database of seven
    outdoor images every pixel should be classified
    as brickface, sky, foliage, cement, window, path,
    or grass there are seven classes and nineteen
    numerical attributes)

37
Decision Tree ClassifierApplications of a
Decision Tree Classifier
  • Other Applications
  • Boston Housing (this dataset gives housing values
    in Boston suburbs there are three classes,
    twelve numerical attributes, one binary
    attribute)
  • Congressional Voting Records (this database gives
    the votes of each member of the U.S. House of
    Representatives of the 98th Congress on sixteen
    key issues the problem is to classify a
    congressman as a Democrat, or a Republican based
    on the sixteen votes there are two classes,
    sixteen categorical attributes (yea, nay,
    neither)

38
Decision Tree Classifier Growing Phase
  • The growing phase of the tree revolves around
    three elements
  • The selection of the splits
  • The decision of when to designate a node terminal
    or to continue splitting it
  • How to determine the class assignments of the
    terminal nodes

39
Decision Tree Classifier Selection of Splits
  • What is a split?
  • Each node of the tree represents a box (rectangle
    in 2 dimensions) in the feature space.
  • Growing of the tree can be accomplished by
    splitting the box into 2 new boxes.
  • The node t representing the original box becomes
    the parent node of the two nodes (children tL
    and tR) representing the 2 new boxes.
  • A rectangle can be split in two ways across the
    x1 or the x2 dimension

40
Decision Tree ClassifierSelection of Splits
  • What is a split? (continued)
  • A box in n dimensions can be split in many
    different ways.
  • The dimension along which we perform the split is
    called split attribute or split feature.
  • The specific value at which the split occurs is
    called split value.
  • What do we accomplish by splitting?
  • The growing of a tree, whose terminal nodes
    represent very specific rules.
  • Smaller rectangles that contain patterns, most of
    which are of the same class label, will provide
    us with very specific, accurate classification
    rules.
  • How do we select a good split?
  • We select a split attribute and corresponding
    split value so that the resulting children nodes
    are purer.

41
Decision Tree Classifier Selection of Splits
  • Define by the proportion of class j
    cases at node t of the tree
  • Define also by a measure of impurity for
    node t of the decision tree. Note that
  • is a nonnegative function
  • it depends on the probabilities
    , where J is the number of different
    classes
  • It achieves its maximum value (equal to 1) when
    the are all equal to
  • It achieves its minimum value (equal to 0) when
    one of the is equal to 1 and the
    rest of the are equal to 0

42
Decision Tree Classifier Examples of impurity
measures
  • The Entropy impurity measure
  • The Gini Function impurity measure

43
Decision Tree Classifier Entropy Gini Impurity
Functions
44
Decision Tree Classifier Selection of Splits
  • Selection procedure
  • Given a node t select the split attribute n and
    the split value s so that the difference between
    the impurity of t and the average impurity of the
    children tL and tR is maximized, viz.

45
Decision Tree Classifier Selection of Splits An
Example
  • Example
  • Let a node t be represented by the rectangle
    0?x1?10, 0?x2?10 containing 50 patterns of
    class 1 (blue) and 50 patterns of class 2 (red).
  • Lets consider splitting t along the x1
    attribute.

46
Decision Tree Classifier Differences between
Impurity Functions
  • Difference in impurities versus x1
  • Best split value is 5.146 for both impurity
    functions

47
Decision Tree Classifier Differences between
Impurity Functions
  • Entropy and Gini impurities are qualitatively
    similar and, therefore, most often they give
    similar, if not identical, splits.

48
Decision Tree Classifier Resubstitution Error
as Impurity Function
  • Another candidate function that seems natural to
    be used as a node impurity measure would be the
    Resubstitution Error (misclassification error on
    the training set)

49
Decision Tree Classifier Resubstitution Error
as Impurity Function
  • Most of the times using the resubstitution error
    (RE) will provide the same best split as when
    using the Gini or entropy impurity (Case A)
  • However, there are a number of occasions, where
    the difference in impurity ?i(n,st) as measured
    by the RE is locally flat (regions of same
    values), implying a variety of equally good
    splits. (Case B)Among those equally good splits
    there is usually one or two that intuitively
    would seem more reasonable. These latter splits
    are typically easier to identify via the Gini or
    entropy impurities
  • The phenomenon of non-uniqueness of best splits,
    which rarely occurs when using the Gini or
    entropy impurities, makes the RE less
    suitable/convenient to determine splits

50
Decision Tree Classifier Resubstitution Error
as Impurity Function
  • Case A
  • RE, Gini, entropy suggest a unique best split.
  • It is quite common that all three impurity
    measures may suggest similar if not identical
    splits.

51
Decision Tree Classifier Resubstitution Error
as Impurity Function
  • Case B
  • RE claims all splits are equivalent!
  • Gini entropy suggest only two equivalent
    splits, which intuitively are quite reasonable.

52
Decision Tree Classifier An Example
53
Decision Tree ClassifierAn Example
54
Decision Tree Classifier The first Split Level 0
  • We are at the root of the tree with data 1, 2,
    3, 4, 5 of Class A and data (6, 7, 8, 9, 10 of
    class B
  • The possible x-splits that we need to consider
    are
  • The possible y-splits that we need to consider
    are

55
Decision Tree ClassifierChange in Impurity for
x-splits Level 0
56
Decision Tree ClassifierChange in Impurity for
y-splits
57
Decision Tree Classifier Calculation of the
impurity difference
  • Best Split
  • Left Node Data 1, 2, 4 , Right Node Data
    3, 5, 6, 7, 8, 9,10
  • Impurity of Parent
  • Impurity of left child

58
Decision Tree Classifier Calculation of the
Impurity Difference
  • Impurity of the right child
  • Average Impurity of the left and right child
  • Difference in Impurity

59
Decision Tree Classifier Picture of how the best
1st split looks
1,2,3,4,5 A 6,7,8,9,10 B
x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Numerals ? data Letters ? class label
60
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
61
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
62
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
63
Decision Tree ClassifierThe second Split Level 1
  • The left node (child) of the tree has data 1, 2,
    4 of the same classification (class A). So, no
    further splitting of the data residing in the
    left node is needed.
  • The right node (child) of the tree has data 3,
    5, 6, 7, 8, 9, 10 of which data 3, 5 are of
    class A and data 6, 7, 8, 9, 10 are of class B.
    So further splitting of the data residing in the
    right node is needed.
  • The possible x-splits that we need to consider
    are
  • The possible y-splits that we need to consider
    are

64
Decision Tree Classifier Change in Impurity for
x-splits Level 1, right node
65
Decision Tree ClassifierChange in Impurity for
y-splits Level 1, right node
66
Decision Tree Classifier Picture of how 2nd
split looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
x gt0.5
x lt 0.5
3,5 A 7,9 - B
6,8,10 - B
Numerals ? data Letters ? class label
67
Decision Tree ClassifierPicture of how 3rd split
looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
y gt 0.5
y lt 0.5
5 A 7,9 - B
3 A
Numerals ? data Letters ? class label
68
Decision Tree ClassifierPicture of how 4th split
looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
5 A 7,9 - B
3 A
x gt 0.4
x lt 0.4
5 A 7 - B
9 B
Numerals ? data Letters ? class label
69
Decision Tree Classifier Picture of how 5th
split looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
5 A 7,9 - B
3 A
5 A 7 - B
9 A
y gt 0.6
y lt 0.6
7 B
5 A
Numerals ? data Letters ? class label
70
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.5 Pr(Class 2) 0.5
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
71
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
72
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
Pr(Class 1) 0.5 Pr(Class 2) 0.5
Pr(Class 1) 0.05 Pr(Class 2) 0.4
Pr(Class 1) 0.45 Pr(Class 2) 0.1
73
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
74
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.6 Pr(Class 2) 0.4
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
75
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.6, Pr (Class 2)0.4
76
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.7 Pr(Class 2) 0.3
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
77
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.7, Pr (Class 2)0.3
78
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.8 Pr(Class 2) 0.2
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
79
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.8, Pr (Class 2)0.2
80
Decision Tree ClassifierUnderstanding Split
Choices
Pr(Class 1) 0.9 Pr(Class 2) 0.1
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
81
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.9, Pr (Class 2)0.1
82
Decision Tree Classifier -Terminal Node Issue
When does the tree stops growing?
1,2,3,4,5 A 6,7,8,9,10 - B
  • Criterion 1 (Stop Min Records)
  • The number of records in an node is below a
    minimum number of records threshold
  • The minimum number of records criterion is
    checked first

3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
y gt 0.5
y lt 0.5
5 A 7,9 - B
3 A
Stop Beta 0.0 Stop Purity 100 Stop Min
Records 2
Stop Reason Reached Min Records
83
Decision Tree Classifier -Terminal Node
IssueWhen does the tree stops growing?
  • Criterion 2 (Stop Purity)
  • We have reached an acceptable purity level
  • The purity level stop criterion
  • is checked second

1,2,3,4,5 A 6,7,8,9,10 - B
x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Stop Beta 0.0 Stop Purity 100 Stop Min
Records 2
Stop Reason Reached Purity Level
84
Decision Tree Classifier -Terminal Node Issue
When does the tree stops growing?
  • Criterion 3 (Stop Beta)
  • The maximum difference in impurity between parent
    and children is smaller than an allowable
    difference threshold
  • The maximum difference in impurity stop criterion
    is checked third

1,2,3,4,5 A 6,7,8,9,10 B
x lt 0.35
x gt 0.35
3,5 - A 6,7,8,9,10 - B
1, 2, 4 - A
Stop Reason Reached threshold for Beta
Stop Beta 0.3 Stop Purity 100 Stop Min.
Records 2
85
Decision Tree Classifier Class Node Assignments
1,2,3,4,5 A 6,7,8,9,10 - B
  • In the figure to the right the class assignment
    for the right node of the tree is Class B
    because the majority class is Class B.
  • The right node has 5 records from Class B and 2
    records from Class A

x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Class Assignment Majority Class Class A
Class Assignment Majority Class Class B
Write a Comment
User Comments (0)
About PowerShow.com