A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees PowerPoint PPT Presentation

presentation player overlay
1 / 68
About This Presentation
Transcript and Presenter's Notes

Title: A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees


1
A Probabilistic Approach to Classify Incomplete
Objects Using Decision Trees
  • DB Seminar
  • 1st Feb, 2007
  • Speaker Tsang Pui Kwan (Smith)
  • Supervisor Dr. B.C.M. Kao

DEXA 2004 (DEXA 2006 -- Evaluation) Authors L.
Hawarah, A. Simonet and M. Simonet
2
Introduction
  • Background Knowledge
  • Decision Tree Classifier
  • Classifying Incomplete Objects
  • Missing Values Handling
  • Previous Works
  • Ordered Attribute Trees (OAT)
  • Proposed Approaches by Authors
  • Probabilistic Ordered Attribute Trees (POAT)
  • Probabilistic Attribute Trees (PAT)
  • Potential Problems and Solutions
  • Evaluations (DEXA2006)

3
Classification
  • Important Problem on Data Mining and Machine
    Learning
  • Predicts or Classifies Future Objects/Cases using
    previously known results
  • Supervised Learning
  • User specifies the targets

I want to know new customers credit risks!
Previously Known Results
New Cases Or Objects
predicts
4
Classification
Unseen Cases/Objects
  • Two Step Process
  • Model Construction
  • Model Usage

Yes!
Classification Algorithm
Training Data
Class label
Classifier (model)
5
Classification
  • Applications
  • Scientific experiments
  • Medical diagnosis
  • Fraud detection
  • Credit approval
  • Target marketing
  • etc

6
Classification Models
  • Various models has been proposed
  • Decision Trees
  • Classification Rules
  • Bayesian Classifiers
  • Neural Networks
  • Support Vector Machines

7
Decision Tree Classifier
  • One of the most popular classification model
  • Simple, Powerful, Human readable

Internal Nodes
branches
outlook?
Tests
sunny
rainy
overcast
Answers of the tests!
windy?
humidity?
yes
normal
high
TRUE
A Decision
no
no
yes
yes
Leaf Nodes
Example Decision Tree
8
Decision Tree Classifier
  • Decision Tree Induction
  • Traditional algorithms includes Quinlan ID3, C4.5
  • Top down recursive divide-and-conquer
  • Greedy searching for the local best
    partitioning
  • Attribute Selection
  • Determine how cases in given node to be split

Top-Down
Original Training Set
Select an attribute!
Partitioning using possible values
Training Set Partitioned by a selected attribute

Continue the process on reduced set using other
attributes recursively!
9
Decision Tree Classifier
  • Evaluation functions
  • Examples Information Gain (ID3), Gain Ratio
    (C4.5)
  • Based on Entropy measure Impurity/randomness
  • An attribute is selected if it reduces entropy of
    the original set the most by partitioning
  • Target of Partitioning
  • Leaf Node with cases ALL of the same class (Pure)
  • Not possible in most cases
  • Other Stopping Criterion
  • All attributes used up
  • No cases or too few cases left

Training cases
e.g. All same class
yes
Leaf Node
10
Example Training Set
To Play or Not to Play
11
Decision Tree Construction
Attribute humidity makes best partitioning
All cases of same class, Stop!
sunny Partition
Attribute outlook makes best partitioning
Attribute windy makes best partitioning
outlook?
sunny
rainy
overcast
rainy Partition
windy?
humidity?
yes
FALSE
normal
high
TRUE
no
yes
no
yes
overcast Partition
Play Example
12
Decision Tree Usage
  • Classification is done by searching the leaf node
    from root node through the branches

outlook?
rainy
windy?
FALSE
Unseen Cases
yes
rainy
FALSE
13
How about if values are missing?
How to build the tree?
??
Play Example
Root Node
14
How about if values are missing?
Example Decision Tree
Which BRANCH should I go ????
outlook?
?
?
?
?
Unseen Cases
15
Source of Missing Values
  • Not entered due to misunderstanding

What was written?
During Data Entry
16
Source of Missing Values
  • Not available during data collection

Sorry, Id prefer not to answer.
What is your income?
I dont have a car.
Do you have a car? If yes, what is your car type?
Important Missing values may not be errors Can
be intentionally made!
Example Conversation of a survey
17
Source of Missing Values
  • Equipment failures
  • Inconsistent with other data values
  • E.g. Age vs. date of birth

Today is 1-2-2007.
18
Missing Values
  • Could appear in both training set and unseen
    cases/objects
  • Problems How can classification down on cases
    contains attributes with missing values?

19
Why do we need to Handle missing value carefully?
  • Accuracy!
  • The cost of misclassification is high for some
    applications
  • Example Cancer diagnosis
  • False Negative Cancer patient wrongly classified
    as healthy
  • Reduced patients recover chance

Dont Worry! Our diagnosis is highly accurate!! ?
20
Issue to focused
  • Classifying Incomplete Objects (Unseen Cases with
    Missing Values)
  • Only consider categorical attributes
  • Key Step Estimate the Missing Values

How to do?
21
Directions
  • Information Available Training Set
  • Popular Strategy Using Training Set for
    Estimation of Missing Value
  • An Unseen Case has high chance to follow results
    of the similar cases in training set
  • Problem bias by training cases
  • Estimation should be more accurate if the set of
    training data is large

22
How to handle missing values?
  • Trial 1 Replacing a missing values with a value
    consider adequate
  • Problem What should the value be?
  • Commonly Known Value
  • Estimation is case independent

SKIP
23
C4.5 Missing Value Handling
  • When an internal node is encountered and the
    relevant attribute value is missing
  • Investigate all branches
  • Estimate the probabilities of reaching each
    branch
  • The class distribution is found by combining the
    classification results of different branches

outlook?
rainy
sunny
overcast
humidity?

normal
high
C4.5 Decision Tree from Training Set
no
yes
Unseen Case
24
C4.5 Missing Value Handling
  • Probability Estimation
  • By the size of the partitions
  • Prob. of a branch cases in a partition /
    cases represented by that internal node
  • Work well if most of the attributes are
    independent
  • A class distribution is returned as the result

25
C4.5 Missing Value Handling

Normal 40high 60
no 40yes 60
Unseen Case
outlook?
rainy
sunny
overcast
Prob. of branch 2/5 40
humidity?

Prob. Of branch 3/5 60
normal
high
Probability of each branch is estimated by case
in that branch over cases in the node.
no
yes
Example of Missing Value Handling in C4.5
no 1 x 0.4 0.4yes 1 x 0.6 0.6
26
Using Decision Tree as a Tools
  • Ordered Attribute Trees (OAT) Method
  • By Lobo and Numao (PAKDD 1999, JSAI 2000)
  • Using Decision Trees for estimating missing
    attribute values
  • An Inference-based Approach

27
Ordered Attribute Trees (OAT)
  • A decision tree is built for each attribute using
    the corresponding attribute as if it is the Class
    Label
  • Described only by lower ordered attributes (have
    weaker relation with the class) in the training
    set

ordered
attribute A
attribute B
attribute C
28
Ordered Attribute Tree (OAT)
  • Mutual Information Dependency Measurement
  • Symmetric function
  • Measures the reduction in uncertainty about
    random variable X from learning a value in Y
  • x a value of X in domain of X
  • P(x) probabilities of occurrence of value x
  • P(xy) conditional probabilities of X having
    value x once Y is of value y

29
Ordered Attribute Trees (OAT)
  • Play Example
  • MI(outlook, play) 0.2467
  • MI(temp,play) 0.02922
  • MI(humidity, play) 0.1518
  • MI(windy,play) 0.04813
  • Order (ascending) Temperature, Windy, Humidity,
    Outlook

30
OAT Examples
  • Temperature have the lowest lower
  • Tree would contain only root node, with the most
    probably value mild (6/14)
  • Windy OAT using ID3
  • Only contain Temperature
  • But not Humidity, Outlook

Mild 6/14
Temperature (14)
hot
mild
cool
true 2/4
false 3/4
true 3/6
31
OAT Examples
  • Humidity OAT using ID3
  • Similar to Windy Tree

Temperature (14)
hot
mild
cool
Windy(4)
normal 4/4
high 4/6
true
false
high 2/3
high 1/1
32
Usage of OAT
  • Missing attribute values are filled by using
    corresponding Trees
  • If a case contains two or more missing
    attributes, lowest order one will be filled first

Temperature (14)
hot
mild
cool
high
Windy(4)
normal 4/4
high 4/6
Cases with Missing values
true
false
high 2/3
high 1/1
33
Problem on OAT
  • Leaf Node with Single Value
  • Issues on Attribute dependency

Temperature (14)
hot
mild
cool
TRUE
FALSE
TRUE
Windy OAT
Training Cases with temp cool
A
x
x
Lower ordered Attribute B
B
B
OAT of Attribute A
34
Leaf Node with Single Value
  • Leaf node associated with 1 value ONLY
  • For example -- Temperature OAT
  • Mild is chosen but it is NOT dominate in the node
  • Lack of representative power
  • Single value is inadequate

Mild 6/14
Mild is most probable valuebut more than half of
cases are not Mild!
35
Issues on Dependent Attributes
  • Best Estimation of Missing Values should rely on
    dependent attributes
  • For Example
  • Owns House vs. Owns Car
  • Installed Cable TV vs. Watch Soccer Matches

36
Issues on Dependent Attributes
  • OAT relies only on attribute relationship with
    class (in the training set)
  • Not care about attribute dependency
  • Example
  • Humidity OAT contains node with Windy
  • But MI (humidity, windy) 0, i.e. independent
  • The prediction from independent attributes are
    less accurate

Humidity
x
x
Windy
?
windy
Humidity OAT
37
Probabilistic Approach
  • Probabilistic OAT (POAT)
  • Extended version of OAT
  • Probabilistic Attribute Trees (PAT)
  • New approach

Using Probability make better and more complete
results!
SKIP
38
Probabilistic OAT
  • Improved version of OAT with probabilistic
    information
  • Leaf node contains a probability distribution
    instead of a single most probable value

high
high 67normal 33
Leaf Node of OAT
Leaf Node of POAT
39
POAT Examples
  • Humidity POAT using ID3

Temperature (14)
hot
mild
cool
high 67normal 33
Windy(4)
normal 100
true
false
high 67normal 33
high 100
40
Usage of POAT
  • Similar with OAT
  • The missing values filled would be a probability
    distribution instead
  • Final classification result is a class
    distribution

Temperature (14)
hot
mild
cool
high 67normal 33
Windy(4)
normal 100
High 67normal 33
true
false
high 67normal 33
high 100
Cases with Missing values
Humidity POAT
41
Probabilistic Attribute Trees
  • Take account of dependency of attributes
  • NO ordering imposed on attributes
  • A Leaf node contains a probabilistic distribution
    instead of a single most probable value
  • Similar to POAT

Attributes
Class Label
NOT CARE on tree construction
42
PAT Construction
  • PAT is constructed for every attribute with
    dependent attributes
  • Mutual Information is used again to for the
    measurement
  • Dependency between attributes is defined by a
    threshold
  • Attribute A is said to depend on Attribute B and
    vice versa if MI(A,B) gt threshold

43
Play Example on PAT
  • Settings
  • Threshold 0.01
  • Dependent Attribute Sets
  • Dep(Humidity) Temp, Outlook
  • Dep(Outlook) Temp, Humidity
  • Dep(Temp) Humidity, Outlook, Windy
  • DepWindy Temp

44
Play Example on PAT
  • Humidity PAT
  • Contains only its dependent attributes
  • i.e. Temperature and Outlook

Temperature (14)
hot
mild
cool
Outlook(4)
normal 100
Outlook(4)
overcast
rain
sunny
overcast
rain
sunny
?
normal 50 high 50
high 100
high 100
normal 33 high 66
normal 50 High 50
45
Play Example on PAT
  • Outlook PAT
  • Contains only its dependent attributes
  • i.e. Temperature and Humidity

46
Usage of PAT
  • Similar to POAT

Temperature (14)
hot
mild
cool
Outlook(4)
normal 100
Outlook(4)
overcast
rain
sunny
overcast
rain
sunny
?
normal 50 high 50
high 100
high 100
normal 33 high 66
normal 50 High 50
Humidity PAT
High 50normal 50
Unseen Cases with Missing values
47
Problems on PAT
  • Cycle Problem
  • Indeterminate Leaves Problem

attribute A
Value a
0 cases!
48
Cycle Problem
  • Happens when two or more dependent attributes
    with missing values

Depends on Humidity
Humidity PAT Contains outlook!
Depends on Outlook
Outlook PAT Contains humidity!
49
Possible Solution
  • Using POAT to estimate the missing values of the
    lower order attributes first
  • Used PAT again after no cycle exists

high 67normal 33
sunny 43.5 overcast 54.7
Humidity POAT
33
67
sunny 25 rainy 50 overcast 25
Outlook PAT
sunny 0.67x0.67 0.435overcast 0.33 0.33x0.67
0.547
50
Another Solution
  • Using a set of PATs instead of POAT
  • Build PATs for any subset of dependent attributes
  • If some dependent attributes are missing
  • Use tree built by remaining subset instead
  • Problem Efficiency and Space Overhead

Humidity PAT (without Temp)
Humidity PAT (without Outlook)
Humidity PAT (without Temp and Outlook)
Humidity PAT
Humidity PAT (without Outlook)
Use this!
Set of PATs for Humidity
51
Indeterminate Leaves Problem
  • A leaf node of a PAT could contain no cases
  • Happen when the attribute for partition contains
    three or more values

Unseen Cases with Missing values
Temperature (14)
hot
mild
cool
Outlook(4)

overcast
rain
sunny
No cases!
normal 50 high 50
high 100
What should be the results?
Humidity PAT
52
Possible Solution
SKIP
  • Using POAT if case with missing values reaches
    leaf with no cases

Unseen Cases with Missing values
high 67normal 33
Temperature (14)
hot
mild
cool
Outlook(4)

overcast
rain
sunny
No cases!
normal 50 high 50
high 100
Use POAT
Humidity PAT
Humidity POAT
53
Evaluation on PAT
  • Compare PAT with C4.5
  • vote database
  • Classes 2 (Democrat, Republican)

54
Evaluation on PAT
55
Evaluation
  • Thresholds are set based on the average Mutual
    Information of all the attributes ( 0.26)
  • The set of thresholds 0.2, 0.3, 0.4, 0.5

56
Results
SKIP
  • The accuracy of PAT is higher than C4.5

57
Evaluation
  • Breast-cancer Database
  • 2 classes (no-recurrence-events,
    recurrence-events)
  • Some attributes are multi-valued

58
Evaluation on PAT
  • The set of thresholds 0.02, 0.03, 0.04
  • Estimated by Normalized Mutual Information
  • MI biased to multi-valued attributes

59
Results
SKIP
  • The accuracy of PAT is higher than or equal to
    C4.5

60
Analysis
  • Compare classification quality by estimated class
    distribution
  • Instance Analysis Algorithm
  • Measure class distribution from training cases
    that are similar to the case with missing value
  • Constant near two cases are near if the
    distance between them is lower than it

Cases of class A
Training Set
SimilarTraining Cases
Cases of class B
61
Analysis
62
Analysis
  • The class distribution for the similar cases
    are generally matching the probabilistic result
    of PAT compared to C4.5
  • PAT Closer to reality

63
Conclusion
  • Missing Values Obstacles for classification
  • Missing Value Handling
  • Ordered Attribute Trees (OAT)
  • Probabilistic Approaches
  • Probabilistic OATs (POAT)
  • Probabilistic Attribute Trees (PAT)
  • Potential Problems and Possible Solutions

64
Thank You!
  • Questions?

65
The End!
66
Evaluation on PAT
  • Threshold is set near to the average Normalized
    Mutual Information of all the attributes ( 0.26)
  • The set of thresholds 0.2, 0.3, 0.4, 0.5

67
Trial X
  • Ignore training cases contains missing value
  • For Training Set only
  • Problem What if there are many cases with
    missing values?
  • Other attribute value of training cases with
    missing value may be useful and valuable

68
Possible Methods
  • Using Decision Trees
  • Shapiros Method (1987)
  • Using subset of training set with known value on
    target attribute
  • Target attribute treated as class label
  • Class is used as another attribute
  • ONLY for Building Phase

Tree for outlook
Training subset without missing value
Training set with missing value
youth
fill in
Cases with missing values
Write a Comment
User Comments (0)
About PowerShow.com