Title: Chapter 3: Decision Tree Learning
1Chapter 3 Decision Tree Learning
2Decision Tree Learning
- Introduction
- Decision Tree Representation
- Appropriate Problems for Decision Tree Learning
- Basic Algorithm
- Hypothesis Space Search in Decision Tree Learning
- Inductive Bias in Decision Tree Learning
- Issues in Decision Tree Learning
- Summary
3Introduction
- A method for approximating discrete-valued target
functions - Easy to convert learned tree into if-then rule
- ID3, ASSISTANT, C4.5
- Preference bias to smaller trees.
- Search a completely expressive hypothesis space
4Decision Tree Representation
- Root -gt leaf? sorting??? ???? ??
- Node attribute ???
- Branch attributes value? ??
- Disjunction of conjunctions of constraints on the
attribute values of instances
5(No Transcript)
6Appropriate Problems for Decision tree Learning
- Instances are represented by attribute-value
pairs - The target function has discrete output values
- Disjunctive descriptions may be required
- The training data may contain errors
- The training data may contain missing attribute
values
7Basic Algorithm
- ??? ?? decision trees space??? top-down, greedy
search - Training examples? ?? ? ??? ? ?? attribute? ???
??. - Entropy, Information gain
8(No Transcript)
9Entropy
- Minimum number of bits of information needed to
encode the classification of an arbitrary member
of S - entropy 0, if all members in the same class
- entropy 1, if positive examplesnegative
examples
10(No Transcript)
11Information Gain
- Expected reduction in entropy caused by
partitioning the examples according to attribute
A - Attribute A? ???? ???? entropy? ?? ??
12(No Transcript)
13(No Transcript)
14Which Attribute is the Best Classifier? (1)
15Which Attribute is the Best Classifier? (2)
Classifying examples by Humidity provides more
information gain than by Wind.
16(No Transcript)
17Hypothesis Space Search in Decision Tree Learning
(1)
- Training examples? ??? ??? hypothesis? ???.
- ID3? hypothesis space
- the set of possible decision trees
- Simple-to-complex, hill-climbing search
- Information gain gt hill-climbing? guide
18(No Transcript)
19Hypothesis Space Search in Decision tree Learning
(2)
- Complete space of finite discrete-valued
functions - Single current hypothesis ? ????.
- No back-tracking
- ??? ? ???? ?? training examples ??-???? ??? ??
20Inductive Bias (1) - Case ID3
- Examples? ???? decision tree? ? ?? decision tree?
???? ? ???? - Shorter trees are preferred over larger trees,
- Trees that place high information gain attributes
close to the root are preferred.
21Inductive Bias (2)
22Inductive Bias (3)
- Occams razor
- Prefer the simplest hypothesis that fits the data
- Major difficulty
- ??? ?? ??? ?? hypothesis? ??? ??? ? ??.
23Issues in Decision Tree Learning
- How deeply to grow the decision tree
- Handling continuous attributes
- Choosing an appropriate attribute selection
measure - Handling the missing attribute values
24Avoiding Overfitting the Data (1)
- Training examples? ???? ??? ??? tree? ?????
- 1. Data? noise? ?? ?
- 2. Training examples ?? ?? ?
- Overfit training data? ?? hypothesis h,h? ?? ?
- h? error lt h? error, (training examples? ???)
- h? error gt h? error, (?? ????? ???)
25(No Transcript)
26Avoiding Overfitting the Data (2)
- ???
- 1.examples? training set? validation set?? ???.
- 2.?? data? training?? ????, ?? ??? ??? ??? ?? ?
?? ? ????? ????. - 3.Training examples, decision tree? encoding??
???? ???? explicit measure?? -chapter 6 - 1? ?? training and validation set approach
- validation set gt hypothesis? pruning ?? ??
27Reduced Error Pruning
- validation set? ???, ??? ??? tree? ??? tree?? ???
?? ??? ??? ?, ? ??? ????. - Training set?? ???? ??? leaf ??? ??? ???? ??.
- ? ?? ???? validation set??? ????? ??? ??
- Training set, test set, validation set?? ??
- ?? data? ?? ?? ?
28(No Transcript)
29Rule Post-Pruning (1)
- 1. Decision tree? ???. (overfitting ??)
- 2. Root?? leaf? ??? rule? ??
- 3. Precondition? ?????? estimated accuracy ?
????? rule? ?? - 4. Estimated accuracy? ?? sort??. Subsequent
instance? ??? ? ??? ??? ????.
30Rule Post-Pruning (2)
- Pruning ?? decision tree? rule? ???? ??
- Decision node? ???? ??? context?? ??? ? ??.
- Root? leaf ????? attribute ???? ??? ?? ??.
31Incorporating Continuous-Valued Attributes
- Information gain? ??? ?? ??
- threshold? ???.
- Attribute value? ?? sort??.
- Target classification? ??? pair? ???.
- ? pair? ???? threshold ??? ??.
- ? ??? ? information gain? ??? ?? ?? ??
32Alternative Measures for Selecting Attributes (1)
- Information gain measure? ?? value? ?? attribute?
????.
- Attribute Data (e.g. March 4. 1979)
- Training data? ???? target attribute? ???? ??
- ?? predictor? ?? ???
33Alternative Measures for Selecting Attributes (2)
- attribute A? value? ?? ????? S? ?? entropy??.
- n?? data? n?? value? ???? ?????
- 2???? ???? ??? 2?? value? ????
34Alternative Measures for Selecting Attributes(3)
35Handling Training Examples with Missing Attribute
Values
- node n? ?? examples ??? C(x)? ??? ?? ? ?? ??
attribute value? ??? - attribute A? ??? value? ?? ???? ??.
- Node n? ?? A? value? frequency? ?????? ? ? ??.
36Handling Attributes with Differing Costs
37Summary
- ID3 family root rule?? downward? ??, next best
attribute? greedy search - Complete hypothesis space
- Preference for smaller trees
- Overfitting avoidance by Post-pruning