Decision Tree Learning - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Decision Tree Learning

Description:

A decision tree is a tree where each node of the tree is associated with an ... The decision trees represent a disjunction of conjunctions of constraints on the ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 36
Provided by: leeki9
Category:

less

Transcript and Presenter's Notes

Title: Decision Tree Learning


1
Decision Tree Learning
  • 2003. 8. 7
  • Lee, Ki-joong

2
Outline
  • Decision Tree Representation
  • Decision Tree Learning
  • Entropy, Information Gain
  • Overfitting

3
Definition of Decision Trees
  • A decision tree is a tree where each node of the
    tree is associated with an attribute and each
    branch is associated with the value of the
    attribute. Each path from the root to a leaf
    corresponds to a conjunction of attribute tests
    and is labeled with a target value. The decision
    trees represent a disjunction of conjunctions of
    constraints on the attribute values

4
Computation in Decision Trees
  • An instance is classified by starting at the root
    node of a decision tree, testing the attribute
    specified by the node, then moving down the tree
    branch corresponding to the value of the
    attribute in the given example.

5
Overview of Decision Tree Learning
  • How to find (search) a decision tree (hypothesis)
    that best fits a given set of training examples?
  • Construct a decision tree from a root node by a
    greedy search process
  • At each node, select the attribute the best
    classifies the local training examples.

6
Decision Tree Learning Algorithm
7
How to Select The Best Attribute?
8
Training Examples
9
Entropy - 1
  • Measure of purity of an arbitrary collection of
    examples.

10
Entropy - 2
  • Entropy specifies the expected minimum number of
    bits for an arbitrary message.
  • Entropy can be used to measure the information in
    an arbitrary message

11
Change in Information
12
Information Gain
  • Average reduction in entropy caused by
    partitioning the examples according to an
    attribute
  • The information provided about the target
    function value by knowing the value of attribute a

13
Information Gain Examples
14
Training Examples
15
ID3 Trace 1
16
ID3 Trace 2
17
Review of ID3
  • The hypothesis space a set of all finite
    discrete-valued functions
  • ID3 is a simple-to-complex hill-climbing search
    through hypothesis space
  • ID3 is susceptible to converging to a locally
    optimal solution

18
Inductive Bias of ID3
  • BFS-ID3
  • Shorter trees are preferred over longer trees.
  • ID3
  • Shorter trees are likely to be preferred over
    longer trees
  • Trees that place high information gain attributes
    close to the root are preferred over those that
    do not

19
ID3 vs Candidate-Elimination
  • ID3
  • Complete hypothesis space
  • Incomplete search (suboptimal)
  • Inductive bias is the search order of hypotheses
    preference bias, search bias
  • Candidate-Elimination
  • Incomplete hypothesis space
  • Complete search (VS)
  • Inductive bias is the search space restriction
    bias, language bias
  • Preference bias / restriction bias / hybrid

20
Why Shorter Trees?
  • Occams razor Prefer the simplest hypothesis
    that fits data
  • Fewer short hypotheses ? less likely coincidence
  • A long hypothesis that fits data might be
    coincidence
  • Argument opposed
  • There are many ways to define small set of
    hypotheses
  • Whats so special about small sets based on size
    of hypothesis?

21
Overfitting
22
Errors In The Training Examples
23
An Overfit Decision Tree
24
Insufficient Training Examples
25
An Overfit Decision Tree
26
Avoiding Overfitting
  • Cross-validation
  • Split data into
  • Stop growing a tree when the error rate on the
    validation set increase
  • Overfit the data, and then post-prune the tree

27
Reduced-Error Pruning
28
Rule Post-Pruning
29
Rule Post-Pruning Examples
30
Reduce-Error Pruning vs Rule Post-Pruning
  • Since each distinct path through the decision
    tree node produces a distinct rule, the pruning
    decision regarding an attribute test can be made
    differently in rule post-pruning.

31
Continuous-Valued Attributes
  • Dynamically define new discrete-valued attributes
    that partition the continuous attribute value
    into a discrete set of intervals

32
Gain Ratio
  • Information gain is biased to favor attributes
    with many values

33
Missing Attribute Values
  • Some attribute of ltx,c(x)gt in a node is missing
  • Majority of training examples at the node
  • Majority of c(x) training examples at the node
  • Fractional examples according to estimated
    distribution is used

34
Attributes with Differing Costs
  • Low cost attributes can be preferred by dividing
    the information gain by the cost of the attribute.

35
Summary of Decision Tree Learning
  • Capable of learning disjunctive expressions?
    Expressive hypothesis space
  • Instances nominal-valued vectors ? Can be
    extended to real-valued vectors
  • Target function boolean-valued output (binary
    classes) ? Can be extended to n-ary classes
  • ID3 uses all training examples at each step to
    compute statistical properties such as
    information gain robust to noisy training data
    ? Less sensitive to errors in training examples
    ? Can handle errors in classifications (target
    values) ? Can handle errors in attribute values
    (input vectors) ? Can handle missing attributes
    in training examples
Write a Comment
User Comments (0)
About PowerShow.com