Title: Decision Tree
1Decision Tree
2Determine Milage Per Gallon
3A Decision Tree for Determining MPG
mpg cylinders displacement horsepower weight accel
eration modelyear maker 4 low low low high 75to7
8 asia
good
From slides of Andrew Moore
4Decision Tree Learning
- Extremely popular method
- Credit risk assessment
- Medical diagnosis
- Market analysis
- Good at dealing with symbolic feature
- Easy to comprehend
- Compared to logistic regression model and support
vector machine
5Representational Power
- Q Can trees represent arbitrary Boolean
expressions? - Q How many Boolean functions are there over N
binary attributes?
6How to Generate Trees from Training Data
7A Simple Idea
- Enumerate all possible trees
- Check how well each tree matches with the
training data - Pick the one work best
Too many trees
How to determine the quality of decision trees?
Problems ?
8Solution A Greedy Approach
- Choose the most informative feature
- Split data set
- Recursive until each data item is classified
correctly
9How to Determine the Best Feature?
- Which feature is more informative to MPG?
- What metric should be used?
Mutual Information !
From Andrew Moores slides
10Mutual Information for Selecting Best Features
From Andrew Moores slides
11Another Example Playing Tennis
12Example Playing Tennis
Humidity
(9, 5-)
Wind
(9, 5-)
High
Norm
Weak
Strong
(6, 1-)
(3, 4-)
(3, 3-)
(6, 2-)
13Predication for Nodes
What is the predication for each node?
From Andrew Moores slides
14Predication for Nodes
15Recursively Growing Trees
cylinders 4
cylinders 5
cylinders 6
Original Dataset
Partition it according to the value of the
attribute we split on
cylinders 8
From Andrew Moore slides
16Recursively Growing Trees
From Andrew Moore slides
17A Two Level Tree
18When should We Stop Growing Trees?
19Base Cases
- Base Case One If all records in current data
subset have the same output then dont recurse - Base Case Two If all records have exactly the
same set of input attributes then dont recurse
20Base Cases An idea
- Base Case One If all records in current data
subset have the same output then dont recurse - Base Case Two If all records have exactly the
same set of input attributes then dont recurse
Proposed Base Case 3 If all attributes have
zero information gain then dont recurse
Is this a good idea?
21Old Topic Overfitting
22What should We do ?
23Pruning Decision Tree
- Stop growing trees in time
- Build the full decision tree as before.
- But when you can grow it no more, start to prune
- Reduced error pruning
- Rule post-pruning
24Reduced Error Pruning
- Split data into training and validation set
- Build a full decision tree over the training set
- Keep removing node that maximally increases
validation set accuracy
25Original Decision Tree
26Pruned Decision Tree
27Reduced Error Pruning
28Rule Post-Pruning
- Convert tree into rules
- Prune rules by removing the preconditions
- Sort final rules by their estimated accuracy
- Most widely used method (e.g., C4.5)
- Other methods statistical significance test
(chi-square)
29Real Value Inputs
- What should we do to deal with real value inputs?
30Information Gain
- x a real value input
- t split value
- Find the split value t such that the mutual
information I(x, y t) between x and the class
label y is maximized.
31Conclusions
- Decision trees are the single most popular data
mining tool - Easy to understand
- Easy to implement
- Easy to use
- Computationally cheap
- Its possible to get in trouble with overfitting
- They do classification predict a categorical
output from categorical and/or real inputs
32Software
- Most widely used decision tree C4.5 (or C5.0)
- http//www2.cs.uregina.ca/hamilton/courses/831/no
tes/ml/dtrees/c4.5/tutorial.html - Source code, tutorial
33The End