Project EMDMLR Decision Tree Classifiers Part 1 - PowerPoint PPT Presentation

1 / 85

About This Presentation

Title:

Project EMDMLR Decision Tree Classifiers Part 1

Description:

Introduction to the Decision Tree Classifier. Important Tree ... Sepal Length (cm) (Feature 1) Sepal Width (cm) (Feature 2) Petal Length (cm) (Feature 3) ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 86

Provided by: mlCec

Category:

more less

Transcript and Presenter's Notes

Title: Project EMDMLR Decision Tree Classifiers Part 1

1
Project EMD-MLR Decision Tree Classifiers (Part
1)

UCF
October 15, 2004

2
Presentation Outline

Introduction to Pattern Recognition
Introduction to the Decision Tree Classifier
Important Tree Functions
Growing Phase
Pruning Phase
Classify Phase
Growing Phase
Split Criteria
Stopping Criteria
Leaf Node Assignment

3
Presentation Outline

Pruning Phase
How the Pruning Phase Works
Classify Phase
Data with Categorical Attributes
Data with non-uniform misclassification costs
Computational Complexity of the Decision Tree
Classifier

4
Pattern Recognition

The ease with which we recognize a face,
understand spoken words, read handwritten
characters, identify our keys in our pocket by
feel and decide whether an apple is ripe by its
smell belies the astounding complex processes
that underlie these acts of pattern recognition
(Duda and Hart, 2001)

5
Pattern RecognitionDefinition

Pattern Recognition the act of taking in raw
data and taking an action based on the category
of the pattern -- has been crucial for our
survival-- and over the years we have tried to
develop algorithms that duplicate the amazing
ability of humans to recognize patterns(Duda and
Hart, 2001)

6
Pattern RecognitionAlgorithms

These algorithms are referred to asPattern
Recognition Algorithms orPattern
Classification Algorithms
An example of a pattern recognition or pattern
classification algorithm is the decision tree
algorithm (decision tree classifier)

7
Pattern RecognitionExample

To understand the complexity of a pattern
recognition system let us consider a simple
example, that of recognizing the type of an Iris
plant

8
Pattern Recognition A case study Iris Data

Iris data consists of 150 data-points of three
different types of flowers
Iris Virginica
Iris Setosa
Iris Versicolor
Analogy Failed/Non-Failed Blades
Each datum has four attributes
Sepal Length (cm) (Feature 1)
Sepal Width (cm) (Feature 2)
Petal Length (cm) (Feature 3)
Petal Width (cm) (Feature 4)
Analogy Operating Hours, Starts, etc.

9
Pattern Recognition Components of a Pattern
Recognition System
Problem Data
Feature Extraction
Feature Selection
Pattern Classification
Pattern Classes
10
Pattern Recognition Feature Extraction,
Selection and Classification

The feature extraction module has the purpose of
extracting (or collecting) some important
information for the task at hand
The feature selection module has the purpose of
extracting the features that are important to
achieve the objective of interest
The classifier module has the purpose of
classifying the data relying on the information
conveyed by the features selected

11
Pattern Recognition Feature Extraction Iris
Data

In our case the features have already been
extracted from the data and they are
Sepal Length (Feature 1)
Sepal Width (Feature 2)
Petal Length (Feature 3)
Petal Width (Feature 4)
Analogy Features already extracted for the blade
data include Operating Hours (OH), Various Types
of Trips (TR), etc.

12
Pattern Recognition Feature Selection Iris Data

In this case we are trying to determine which
features are the most important in recognizing
(classifying) the type of the iris plant
Colored Scatter plots (2-D plots) of 2 features
at a time might be useful (see next slide)
Analogy Scatter Plots of Blade Feature Data,
such as Operating Hours (OH), Trips (TR), Fired
Aborts (FA)

13
Pattern Recognition Iris Data Feature Selection
14
Pattern Recognition Histogram of the Petal
Length Feature
15
Pattern Recognition Histogram of the Petal Width
Feature
16
Pattern Recognition Simple Classifier Model
(Model 1)
17
Pattern Recognition Simple Classifier Model
(Model 2)
18
Pattern Recognition More Complex Classifier
Model (Model 3)
19
Pattern RecognitionPerformance of Model 1
Separating Planes for testing data
20
Pattern Recognition Performance of Model 2
21
Pattern Recognition Performance of Model 3
22
Pattern Recognition Selection of a Classifier
Model

A classifier model is normally selected based on
the following measures of goodness
Performance of the classifier model on previously
unseen data
Simplicity of the classifier
Other measures of goodness might be of interest
to the designer, such as
Computational Complexity of the Classifier
Robustness of the classifier in the presence of
noise

23
Pattern Recognition SystemSelection of a
Classifier Model

An example of a classifier model is a classifier
model called
Decision Tree Classifier

24
Decision Tree Classifier General Overview

The method for constructing a decision tree
classifier from a collection of data is easy to
understand
Data consist of data attributes (e.g., operating
hours, number of starts) and the class label
(scrapped versus non-scrapped blade).
Initially all the data belong to the same set,
located at the root of the tree
Then, a data attribute is chosen, and a test on
this attribute, to split the data into smaller
subsets of higher percentages of one-class
labels, is employed

Decision Trees help you understand what type of
data attributes and attribute values lead to
certain class labels
25
Decision Tree ClassifierGraphic Representation
of a Tree Classifier
Node 0 Root node Nodes 1,2 children of node
0 Node 0 parent of nodes 1 2 Nodes 1,3,4
leaves of the tree
0
Branch of the Tree
Node of the Tree
2
1
4
3
26
Decision Tree ClassifierOperational Phases of
the Tree Classifier

The decision tree has three distinct but
interrelated phases. These are
Growing Phase
Pruning Phase
Test (Classify/Performance) Phase

27
Decision Tree ClassifierGrowing Phase of the
Decision Tree Classifier
0
2
1
4
3
28
Decision Tree ClassifierPruning Phase of the
Decision Tree Classifier
0
2
1
4
3
29
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier

It requires easy to understand elements for its
design, such as
A set of questions Q
A rule for selecting the best split at any node
A criterion for choosing the right size tree
It can be applied to any data structure through
the appropriate formulation of the set of
questions Q

30
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier

It handles both ordered and categorical
variables
Ordered Variable A variable assuming values from
the set1, 2, 3, 4, 5, 6,
Categorical Variable A variable assuming values
from the set green, red, blue, orange,
The final classification has a simple form which
can be compactly stored to efficiently classify
new data

31
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier

It does automatic stepwise variable selection and
complexity reduction
It gives, with no additional effort, not only a
classification, but also an estimate of the
misclassification probability for the object

32
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier

It is invariant under all monotone
transformations of the individual ordered
variables
E.g.After multiplying a specific feature by a
constant (say, change its measurement units), the
resulting decision tree remains unchanged
It is extremely robust to outliers and
misclassified points in the training set, used
for the trees design

33
Decision Tree ClassifierAdvantages of a Decision
Tree Classifier

The tree procedure gives easily understood and
interpreted information regarding the predictive
structure of the data
Given a decision tree, you can extract simple
IF-THEN rules, that show you how the thought
process of the tree, when it classifies.
It has been used successfully in a variety of
applications (see following slides)

34
Decision Tree ClassifierApplications of a
Decision Tree Classifier

Medical Applications
Wisconsin Breast Cancer (predict whether a tissue
sample taken from a patient is malignant or
benign two classes, nine numerical attributes)
Bupa Livers Disorder (predict whether or not a
male patient has a liver disorder based on blood
tests and alcohol consumption two classes, six
numerical attributes)

35
Decision Tree ClassifierApplications of a
Decision Tree Classifier

Medical Applications
PIMA Indian Diabetes (the patients are females at
least 21 years old of Pima Indian heritage living
near Phoenix, Arizona the problem is to predict
whether a patient would test positive for
diabetes there are two classes, seven numerical
attributes)
Heart Disease (the problem here is to predict the
presence or absence of heart disease based on
various medical tests there are two classes,
seven numerical attributes and six categorical
attributes)

36
Decision Tree ClassifierApplications of a
Decision Tree Classifier

Image Recognition Applications
Satellite Image (this dataset gives the
multi-spectral values of pixels within 3x3
neighborhoods in a satellite image, and the
classification associated with the central pixel
the aim is to predict the classification given
the multi-spectral values. There are six classes
and thirty six numerical attributes)
Image Segmentation (this is a database of seven
outdoor images every pixel should be classified
as brickface, sky, foliage, cement, window, path,
or grass there are seven classes and nineteen
numerical attributes)

37
Decision Tree ClassifierApplications of a
Decision Tree Classifier

Other Applications
Boston Housing (this dataset gives housing values
in Boston suburbs there are three classes,
twelve numerical attributes, one binary
attribute)
Congressional Voting Records (this database gives
the votes of each member of the U.S. House of
Representatives of the 98th Congress on sixteen
key issues the problem is to classify a
congressman as a Democrat, or a Republican based
on the sixteen votes there are two classes,
sixteen categorical attributes (yea, nay,
neither)

38
Decision Tree Classifier Growing Phase

The growing phase of the tree revolves around
three elements
The selection of the splits
The decision of when to designate a node terminal
or to continue splitting it
How to determine the class assignments of the
terminal nodes

39
Decision Tree Classifier Selection of Splits

What is a split?
Each node of the tree represents a box (rectangle
in 2 dimensions) in the feature space.
Growing of the tree can be accomplished by
splitting the box into 2 new boxes.
The node t representing the original box becomes
the parent node of the two nodes (children tL
and tR) representing the 2 new boxes.
A rectangle can be split in two ways across the
x1 or the x2 dimension

40
Decision Tree ClassifierSelection of Splits

What is a split? (continued)
A box in n dimensions can be split in many
different ways.
The dimension along which we perform the split is
called split attribute or split feature.
The specific value at which the split occurs is
called split value.
What do we accomplish by splitting?
The growing of a tree, whose terminal nodes
represent very specific rules.
Smaller rectangles that contain patterns, most of
which are of the same class label, will provide
us with very specific, accurate classification
rules.
How do we select a good split?
We select a split attribute and corresponding
split value so that the resulting children nodes
are purer.

41
Decision Tree Classifier Selection of Splits

Define by the proportion of class j
cases at node t of the tree
Define also by a measure of impurity for
node t of the decision tree. Note that
is a nonnegative function
it depends on the probabilities
, where J is the number of different
classes
It achieves its maximum value (equal to 1) when
the are all equal to
It achieves its minimum value (equal to 0) when
one of the is equal to 1 and the
rest of the are equal to 0

42
Decision Tree Classifier Examples of impurity
measures

The Entropy impurity measure
The Gini Function impurity measure

43
Decision Tree Classifier Entropy Gini Impurity
Functions
44
Decision Tree Classifier Selection of Splits

Selection procedure
Given a node t select the split attribute n and
the split value s so that the difference between
the impurity of t and the average impurity of the
children tL and tR is maximized, viz.

45
Decision Tree Classifier Selection of Splits An
Example

Example
Let a node t be represented by the rectangle
0?x1?10, 0?x2?10 containing 50 patterns of
class 1 (blue) and 50 patterns of class 2 (red).
Lets consider splitting t along the x1
attribute.

46
Decision Tree Classifier Differences between
Impurity Functions

Difference in impurities versus x1
Best split value is 5.146 for both impurity
functions

47
Decision Tree Classifier Differences between
Impurity Functions

Entropy and Gini impurities are qualitatively
similar and, therefore, most often they give
similar, if not identical, splits.

48
Decision Tree Classifier Resubstitution Error
as Impurity Function

Another candidate function that seems natural to
be used as a node impurity measure would be the
Resubstitution Error (misclassification error on
the training set)

49
Decision Tree Classifier Resubstitution Error
as Impurity Function

Most of the times using the resubstitution error
(RE) will provide the same best split as when
using the Gini or entropy impurity (Case A)
However, there are a number of occasions, where
the difference in impurity ?i(n,st) as measured
by the RE is locally flat (regions of same
values), implying a variety of equally good
splits. (Case B)Among those equally good splits
there is usually one or two that intuitively
would seem more reasonable. These latter splits
are typically easier to identify via the Gini or
entropy impurities
The phenomenon of non-uniqueness of best splits,
which rarely occurs when using the Gini or
entropy impurities, makes the RE less
suitable/convenient to determine splits

50
Decision Tree Classifier Resubstitution Error
as Impurity Function

Case A
RE, Gini, entropy suggest a unique best split.
It is quite common that all three impurity
measures may suggest similar if not identical
splits.

51
Decision Tree Classifier Resubstitution Error
as Impurity Function

Case B
RE claims all splits are equivalent!
Gini entropy suggest only two equivalent
splits, which intuitively are quite reasonable.

52
Decision Tree Classifier An Example
53
Decision Tree ClassifierAn Example
54
Decision Tree Classifier The first Split Level 0

We are at the root of the tree with data 1, 2,
3, 4, 5 of Class A and data (6, 7, 8, 9, 10 of
class B
The possible x-splits that we need to consider
are
The possible y-splits that we need to consider
are

55
Decision Tree ClassifierChange in Impurity for
x-splits Level 0
56
Decision Tree ClassifierChange in Impurity for
y-splits
57
Decision Tree Classifier Calculation of the
impurity difference

Best Split
Left Node Data 1, 2, 4 , Right Node Data
3, 5, 6, 7, 8, 9,10
Impurity of Parent
Impurity of left child

58
Decision Tree Classifier Calculation of the
Impurity Difference

Impurity of the right child
Average Impurity of the left and right child
Difference in Impurity

59
Decision Tree Classifier Picture of how the best
1st split looks
1,2,3,4,5 A 6,7,8,9,10 B
x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Numerals ? data Letters ? class label
60
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
61
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
62
Decision Tree ClassifierPicture of how another
1st split looks
Numerals ? data Letters ? class label
63
Decision Tree ClassifierThe second Split Level 1

The left node (child) of the tree has data 1, 2,
4 of the same classification (class A). So, no
further splitting of the data residing in the
left node is needed.
The right node (child) of the tree has data 3,
5, 6, 7, 8, 9, 10 of which data 3, 5 are of
class A and data 6, 7, 8, 9, 10 are of class B.
So further splitting of the data residing in the
right node is needed.
The possible x-splits that we need to consider
are
The possible y-splits that we need to consider
are

64
Decision Tree Classifier Change in Impurity for
x-splits Level 1, right node
65
Decision Tree ClassifierChange in Impurity for
y-splits Level 1, right node
66
Decision Tree Classifier Picture of how 2nd
split looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
x gt0.5
x lt 0.5
3,5 A 7,9 - B
6,8,10 - B
Numerals ? data Letters ? class label
67
Decision Tree ClassifierPicture of how 3rd split
looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
y gt 0.5
y lt 0.5
5 A 7,9 - B
3 A
Numerals ? data Letters ? class label
68
Decision Tree ClassifierPicture of how 4th split
looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
5 A 7,9 - B
3 A
x gt 0.4
x lt 0.4
5 A 7 - B
9 B
Numerals ? data Letters ? class label
69
Decision Tree Classifier Picture of how 5th
split looks
1,2,3,4,5 A 6,7,8,9,10 - B
3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
5 A 7,9 - B
3 A
5 A 7 - B
9 A
y gt 0.6
y lt 0.6
7 B
5 A
Numerals ? data Letters ? class label
70
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.5 Pr(Class 2) 0.5
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
71
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
72
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
Pr(Class 1) 0.5 Pr(Class 2) 0.5
Pr(Class 1) 0.05 Pr(Class 2) 0.4
Pr(Class 1) 0.45 Pr(Class 2) 0.1
73
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.5, Pr (Class 2)0.5 An Example
74
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.6 Pr(Class 2) 0.4
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
75
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.6, Pr (Class 2)0.4
76
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.7 Pr(Class 2) 0.3
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
77
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.7, Pr (Class 2)0.3
78
Decision Tree Classifier Understanding Split
Choices
Pr(Class 1) 0.8 Pr(Class 2) 0.2
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
79
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.8, Pr (Class 2)0.2
80
Decision Tree ClassifierUnderstanding Split
Choices
Pr(Class 1) 0.9 Pr(Class 2) 0.1
Pr(Class 1) Pr(Class 2)
Pr(Class 1) Pr(Class 2)
is the portion of class 1 data going to
the left child is the portion of class 2
data going to the left child is the
portion of class 1 data going to the right child
is the portion of class 2 data going to
the right child
81
Decision TreesUnderstanding Split ChoicesPr
(Class 1) 0.9, Pr (Class 2)0.1
82
Decision Tree Classifier -Terminal Node Issue
When does the tree stops growing?
1,2,3,4,5 A 6,7,8,9,10 - B

Criterion 1 (Stop Min Records)
The number of records in an node is below a
minimum number of records threshold
The minimum number of records criterion is
checked first

3,5 A 6,7,8,9,10 - B
1, 2, 4 - A
3,5 A 7,9 - B
6,8,10 - B
y gt 0.5
y lt 0.5
5 A 7,9 - B
3 A
Stop Beta 0.0 Stop Purity 100 Stop Min
Records 2
Stop Reason Reached Min Records
83
Decision Tree Classifier -Terminal Node
IssueWhen does the tree stops growing?

Criterion 2 (Stop Purity)
We have reached an acceptable purity level
The purity level stop criterion
is checked second

1,2,3,4,5 A 6,7,8,9,10 - B
x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Stop Beta 0.0 Stop Purity 100 Stop Min
Records 2
Stop Reason Reached Purity Level
84
Decision Tree Classifier -Terminal Node Issue
When does the tree stops growing?

Criterion 3 (Stop Beta)
The maximum difference in impurity between parent
and children is smaller than an allowable
difference threshold
The maximum difference in impurity stop criterion
is checked third

1,2,3,4,5 A 6,7,8,9,10 B
x lt 0.35
x gt 0.35
3,5 - A 6,7,8,9,10 - B
1, 2, 4 - A
Stop Reason Reached threshold for Beta
Stop Beta 0.3 Stop Purity 100 Stop Min.
Records 2
85
Decision Tree Classifier Class Node Assignments
1,2,3,4,5 A 6,7,8,9,10 - B

In the figure to the right the class assignment
for the right node of the tree is Class B
because the majority class is Class B.
The right node has 5 records from Class B and 2
records from Class A

x gt 0.35
x lt 0.35
1, 2, 4 - A
3,5 A 6,7,8,9,10 - B
Class Assignment Majority Class Class A
Class Assignment Majority Class Class B

Write a Comment

User Comments (0)