A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees

1
A Probabilistic Approach to Classify Incomplete
Objects Using Decision Trees

DB Seminar
1st Feb, 2007
Speaker Tsang Pui Kwan (Smith)
Supervisor Dr. B.C.M. Kao

DEXA 2004 (DEXA 2006 -- Evaluation) Authors L.
Hawarah, A. Simonet and M. Simonet
2
Introduction

Background Knowledge
Decision Tree Classifier
Classifying Incomplete Objects
Missing Values Handling
Previous Works
Ordered Attribute Trees (OAT)
Proposed Approaches by Authors
Probabilistic Ordered Attribute Trees (POAT)
Probabilistic Attribute Trees (PAT)
Potential Problems and Solutions
Evaluations (DEXA2006)

3
Classification

Important Problem on Data Mining and Machine
Learning
Predicts or Classifies Future Objects/Cases using
previously known results
Supervised Learning
User specifies the targets

I want to know new customers credit risks!
Previously Known Results
New Cases Or Objects
predicts
4
Classification
Unseen Cases/Objects

Two Step Process
Model Construction
Model Usage

Yes!
Classification Algorithm
Training Data
Class label
Classifier (model)
5
Classification

Applications
Scientific experiments
Medical diagnosis
Fraud detection
Credit approval
Target marketing
etc

6
Classification Models

Various models has been proposed
Decision Trees
Classification Rules
Bayesian Classifiers
Neural Networks
Support Vector Machines

7
Decision Tree Classifier

One of the most popular classification model
Simple, Powerful, Human readable

Internal Nodes
branches
outlook?
Tests
sunny
rainy
overcast
Answers of the tests!
windy?
humidity?
yes
normal
high
TRUE
A Decision
no
no
yes
yes
Leaf Nodes
Example Decision Tree
8
Decision Tree Classifier

Decision Tree Induction
Traditional algorithms includes Quinlan ID3, C4.5
Top down recursive divide-and-conquer
Greedy searching for the local best
partitioning
Attribute Selection
Determine how cases in given node to be split

Top-Down
Original Training Set
Select an attribute!
Partitioning using possible values
Training Set Partitioned by a selected attribute

Continue the process on reduced set using other
attributes recursively!
9
Decision Tree Classifier

Evaluation functions
Examples Information Gain (ID3), Gain Ratio
(C4.5)
Based on Entropy measure Impurity/randomness
An attribute is selected if it reduces entropy of
the original set the most by partitioning
Target of Partitioning
Leaf Node with cases ALL of the same class (Pure)
Not possible in most cases
Other Stopping Criterion
All attributes used up
No cases or too few cases left

Training cases
e.g. All same class
yes
Leaf Node
10
Example Training Set
To Play or Not to Play
11
Decision Tree Construction
Attribute humidity makes best partitioning
All cases of same class, Stop!
sunny Partition
Attribute outlook makes best partitioning
Attribute windy makes best partitioning
outlook?
sunny
rainy
overcast
rainy Partition
windy?
humidity?
yes
FALSE
normal
high
TRUE
no
yes
no
yes
overcast Partition
Play Example
12
Decision Tree Usage

Classification is done by searching the leaf node
from root node through the branches

outlook?
rainy
windy?
FALSE
Unseen Cases
yes
rainy
FALSE
13
How about if values are missing?
How to build the tree?
??
Play Example
Root Node
14
How about if values are missing?
Example Decision Tree
Which BRANCH should I go ????
outlook?
?
?
?
?
Unseen Cases
15
Source of Missing Values

Not entered due to misunderstanding

What was written?
During Data Entry
16
Source of Missing Values

Not available during data collection

Sorry, Id prefer not to answer.
What is your income?
I dont have a car.
Do you have a car? If yes, what is your car type?
Important Missing values may not be errors Can
be intentionally made!
Example Conversation of a survey
17
Source of Missing Values

Equipment failures
Inconsistent with other data values
E.g. Age vs. date of birth

Today is 1-2-2007.
18
Missing Values

Could appear in both training set and unseen
cases/objects
Problems How can classification down on cases
contains attributes with missing values?

19
Why do we need to Handle missing value carefully?

Accuracy!
The cost of misclassification is high for some
applications
Example Cancer diagnosis
False Negative Cancer patient wrongly classified
as healthy
Reduced patients recover chance

Dont Worry! Our diagnosis is highly accurate!! ?
20
Issue to focused

Classifying Incomplete Objects (Unseen Cases with
Missing Values)
Only consider categorical attributes
Key Step Estimate the Missing Values

How to do?
21
Directions

Information Available Training Set
Popular Strategy Using Training Set for
Estimation of Missing Value
An Unseen Case has high chance to follow results
of the similar cases in training set
Problem bias by training cases
Estimation should be more accurate if the set of
training data is large

22
How to handle missing values?

Trial 1 Replacing a missing values with a value
consider adequate
Problem What should the value be?
Commonly Known Value
Estimation is case independent

SKIP
23
C4.5 Missing Value Handling

When an internal node is encountered and the
relevant attribute value is missing
Investigate all branches
Estimate the probabilities of reaching each
branch
The class distribution is found by combining the
classification results of different branches

outlook?
rainy
sunny
overcast
humidity?

normal
high
C4.5 Decision Tree from Training Set
no
yes
Unseen Case
24
C4.5 Missing Value Handling

Probability Estimation
By the size of the partitions
Prob. of a branch cases in a partition /
cases represented by that internal node
Work well if most of the attributes are
independent
A class distribution is returned as the result

25
C4.5 Missing Value Handling

Normal 40high 60
no 40yes 60
Unseen Case
outlook?
rainy
sunny
overcast
Prob. of branch 2/5 40
humidity?

Prob. Of branch 3/5 60
normal
high
Probability of each branch is estimated by case
in that branch over cases in the node.
no
yes
Example of Missing Value Handling in C4.5
no 1 x 0.4 0.4yes 1 x 0.6 0.6
26
Using Decision Tree as a Tools

Ordered Attribute Trees (OAT) Method
By Lobo and Numao (PAKDD 1999, JSAI 2000)
Using Decision Trees for estimating missing
attribute values
An Inference-based Approach

27
Ordered Attribute Trees (OAT)

A decision tree is built for each attribute using
the corresponding attribute as if it is the Class
Label
Described only by lower ordered attributes (have
weaker relation with the class) in the training
set

ordered
attribute A
attribute B
attribute C
28
Ordered Attribute Tree (OAT)

Mutual Information Dependency Measurement
Symmetric function
Measures the reduction in uncertainty about
random variable X from learning a value in Y
x a value of X in domain of X
P(x) probabilities of occurrence of value x
P(xy) conditional probabilities of X having
value x once Y is of value y

29
Ordered Attribute Trees (OAT)

Play Example
MI(outlook, play) 0.2467
MI(temp,play) 0.02922
MI(humidity, play) 0.1518
MI(windy,play) 0.04813
Order (ascending) Temperature, Windy, Humidity,
Outlook

30
OAT Examples

Temperature have the lowest lower
Tree would contain only root node, with the most
probably value mild (6/14)
Windy OAT using ID3
Only contain Temperature
But not Humidity, Outlook

Mild 6/14
Temperature (14)
hot
mild
cool
true 2/4
false 3/4
true 3/6
31
OAT Examples

Humidity OAT using ID3
Similar to Windy Tree

Temperature (14)
hot
mild
cool
Windy(4)
normal 4/4
high 4/6
true
false
high 2/3
high 1/1
32
Usage of OAT

Missing attribute values are filled by using
corresponding Trees
If a case contains two or more missing
attributes, lowest order one will be filled first

Temperature (14)
hot
mild
cool
high
Windy(4)
normal 4/4
high 4/6
Cases with Missing values
true
false
high 2/3
high 1/1
33
Problem on OAT

Leaf Node with Single Value
Issues on Attribute dependency

Temperature (14)
hot
mild
cool
TRUE
FALSE
TRUE
Windy OAT
Training Cases with temp cool
A
x
x
Lower ordered Attribute B
B
B
OAT of Attribute A
34
Leaf Node with Single Value

Leaf node associated with 1 value ONLY
For example -- Temperature OAT
Mild is chosen but it is NOT dominate in the node
Lack of representative power
Single value is inadequate

Mild 6/14
Mild is most probable valuebut more than half of
cases are not Mild!
35
Issues on Dependent Attributes

Best Estimation of Missing Values should rely on
dependent attributes
For Example
Owns House vs. Owns Car
Installed Cable TV vs. Watch Soccer Matches

36
Issues on Dependent Attributes

OAT relies only on attribute relationship with
class (in the training set)
Not care about attribute dependency
Example
Humidity OAT contains node with Windy
But MI (humidity, windy) 0, i.e. independent
The prediction from independent attributes are
less accurate

Humidity
x
x
Windy
?
windy
Humidity OAT
37
Probabilistic Approach

Probabilistic OAT (POAT)
Extended version of OAT
Probabilistic Attribute Trees (PAT)
New approach

Using Probability make better and more complete
results!
SKIP
38
Probabilistic OAT

Improved version of OAT with probabilistic
information
Leaf node contains a probability distribution
instead of a single most probable value

high
high 67normal 33
Leaf Node of OAT
Leaf Node of POAT
39
POAT Examples

Humidity POAT using ID3

Temperature (14)
hot
mild
cool
high 67normal 33
Windy(4)
normal 100
true
false
high 67normal 33
high 100
40
Usage of POAT

Similar with OAT
The missing values filled would be a probability
distribution instead
Final classification result is a class
distribution

Temperature (14)
hot
mild
cool
high 67normal 33
Windy(4)
normal 100
High 67normal 33
true
false
high 67normal 33
high 100
Cases with Missing values
Humidity POAT
41
Probabilistic Attribute Trees

Take account of dependency of attributes
NO ordering imposed on attributes
A Leaf node contains a probabilistic distribution
instead of a single most probable value
Similar to POAT

Attributes
Class Label
NOT CARE on tree construction
42
PAT Construction

PAT is constructed for every attribute with
dependent attributes
Mutual Information is used again to for the
measurement
Dependency between attributes is defined by a
threshold
Attribute A is said to depend on Attribute B and
vice versa if MI(A,B) gt threshold

43
Play Example on PAT

Settings
Threshold 0.01
Dependent Attribute Sets
Dep(Humidity) Temp, Outlook
Dep(Outlook) Temp, Humidity
Dep(Temp) Humidity, Outlook, Windy
DepWindy Temp

44
Play Example on PAT

Humidity PAT
Contains only its dependent attributes
i.e. Temperature and Outlook

Temperature (14)
hot
mild
cool
Outlook(4)
normal 100
Outlook(4)
overcast
rain
sunny
overcast
rain
sunny
?
normal 50 high 50
high 100
high 100
normal 33 high 66
normal 50 High 50
45
Play Example on PAT

Outlook PAT
Contains only its dependent attributes
i.e. Temperature and Humidity

46
Usage of PAT

Similar to POAT

Temperature (14)
hot
mild
cool
Outlook(4)
normal 100
Outlook(4)
overcast
rain
sunny
overcast
rain
sunny
?
normal 50 high 50
high 100
high 100
normal 33 high 66
normal 50 High 50
Humidity PAT
High 50normal 50
Unseen Cases with Missing values
47
Problems on PAT

Cycle Problem
Indeterminate Leaves Problem

attribute A
Value a
0 cases!
48
Cycle Problem

Happens when two or more dependent attributes
with missing values

Depends on Humidity
Humidity PAT Contains outlook!
Depends on Outlook
Outlook PAT Contains humidity!
49
Possible Solution

Using POAT to estimate the missing values of the
lower order attributes first
Used PAT again after no cycle exists

high 67normal 33
sunny 43.5 overcast 54.7
Humidity POAT
33
67
sunny 25 rainy 50 overcast 25
Outlook PAT
sunny 0.67x0.67 0.435overcast 0.33 0.33x0.67
0.547
50
Another Solution

Using a set of PATs instead of POAT
Build PATs for any subset of dependent attributes
If some dependent attributes are missing
Use tree built by remaining subset instead
Problem Efficiency and Space Overhead

Humidity PAT (without Temp)
Humidity PAT (without Outlook)
Humidity PAT (without Temp and Outlook)
Humidity PAT
Humidity PAT (without Outlook)
Use this!
Set of PATs for Humidity
51
Indeterminate Leaves Problem

A leaf node of a PAT could contain no cases
Happen when the attribute for partition contains
three or more values

Unseen Cases with Missing values
Temperature (14)
hot
mild
cool
Outlook(4)

overcast
rain
sunny
No cases!
normal 50 high 50
high 100
What should be the results?
Humidity PAT
52
Possible Solution
SKIP

Using POAT if case with missing values reaches
leaf with no cases

Unseen Cases with Missing values
high 67normal 33
Temperature (14)
hot
mild
cool
Outlook(4)

overcast
rain
sunny
No cases!
normal 50 high 50
high 100
Use POAT
Humidity PAT
Humidity POAT
53
Evaluation on PAT

Compare PAT with C4.5
vote database
Classes 2 (Democrat, Republican)

54
Evaluation on PAT
55
Evaluation

Thresholds are set based on the average Mutual
Information of all the attributes ( 0.26)
The set of thresholds 0.2, 0.3, 0.4, 0.5

56
Results
SKIP

The accuracy of PAT is higher than C4.5

57
Evaluation

Breast-cancer Database
2 classes (no-recurrence-events,
recurrence-events)
Some attributes are multi-valued

58
Evaluation on PAT

The set of thresholds 0.02, 0.03, 0.04
Estimated by Normalized Mutual Information
MI biased to multi-valued attributes

59
Results
SKIP

The accuracy of PAT is higher than or equal to
C4.5

60
Analysis

Compare classification quality by estimated class
distribution
Instance Analysis Algorithm
Measure class distribution from training cases
that are similar to the case with missing value
Constant near two cases are near if the
distance between them is lower than it

Cases of class A
Training Set
SimilarTraining Cases
Cases of class B
61
Analysis
62
Analysis

The class distribution for the similar cases
are generally matching the probabilistic result
of PAT compared to C4.5
PAT Closer to reality

63
Conclusion

Missing Values Obstacles for classification
Missing Value Handling
Ordered Attribute Trees (OAT)
Probabilistic Approaches
Probabilistic OATs (POAT)
Probabilistic Attribute Trees (PAT)
Potential Problems and Possible Solutions

64
Thank You!

Questions?

65
The End!
66
Evaluation on PAT

Threshold is set near to the average Normalized
Mutual Information of all the attributes ( 0.26)
The set of thresholds 0.2, 0.3, 0.4, 0.5

67
Trial X

Ignore training cases contains missing value
For Training Set only
Problem What if there are many cases with
missing values?
Other attribute value of training cases with
missing value may be useful and valuable

68
Possible Methods

Using Decision Trees
Shapiros Method (1987)
Using subset of training set with known value on
target attribute
Target attribute treated as class label
Class is used as another attribute
ONLY for Building Phase

Tree for outlook
Training subset without missing value
Training set with missing value
youth
fill in
Cases with missing values

Write a Comment

User Comments (0)

About PowerShow.com

A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees PowerPoint PPT Presentation