Decision Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Decision Trees

Description:

Title: PowerPoint Presentation Last modified by: Information and Computer Science Created Date: 1/1/1601 12:00:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 14
Provided by: uci94
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Decision Trees


1
Decision Trees
  • Chapter 18
  • From Data to Knowledge

2
Concerns
  • Representational Bias
  • Hyperrectangles does it match domain
  • Generalization Accuracy
  • Is the learned concept correct?
  • Comprehensibility
  • Medical diagnosis
  • Efficiency of Learning
  • Efficiency of Learned Procedure

3
Simple Example Weather Data
  • Four Features windy, play, outlook nominal
  • Temperature numeric
  • outlook sunny
  • humidity lt 75 yes (2.0)
  • humidity gt 75 no (3.0)
  • outlook overcast yes (4.0)
  • outlook rainy
  • windy TRUE no (2.0)
  • windy FALSE yes (3.0)

4
Dumb DT Algorithm
  • Build tree ( discrete features only)
  • If all entries below node are homogenous, stop
  • Else pick a feature at random, create a node for
    feature and form subtrees for each of the values
    of the feature.
  • Recurse on each subtree.
  • Will this work?

5
Properties of Dumb Algorithm
  • Complexity
  • Homogeneity cost is O(DataSize)
  • Splitting is O(DataSize)
  • Times number of node in tree bd on work
  • Accuracy on training set
  • perfect
  • Accuracy on test set
  • Not great. almost random

6
Many DT models
  • Random selection worked
  • If n-binary features then
  • N 2(N-1)2(N-2).. O(2NN!) UGH!
  • Which trees are best?
  • Occams razor small ones (testable?)
  • Exhaustive search impossible, so maybe Heuristic
    Search. But what heuristic?
  • Goal replace random with heuristic selection

7
Heuristic DT algorithm
  • Entropy Set with mixed classes c1, c2,..ck
  • Entropy(S) - sum pi lg(pi) where pi is
    probability of class ci.
  • Sum weighted entropies of each subtrees, where
    weight is proportion of examples in the subtree.
  • This defines a quality measure on features.

8
Heuristic score of a feature
  • Say split on feature f yields
  • (4, 4-) and ( 1, 3-)
  • quality of f
  • 8/12E(4,4- 4/12E(1,3-)
  • 8/13 2 4/12 (- 1/4log(1/4) -3/4log(3/4))
  • Do this for every feature!
  • J48 is roughly dumb entropy heuristic

9
Shannon Entropy
  • Entropy is the only function that
  • Is 0 when only 1 class present
  • Is k if 2k classes, equally present
  • Is additive ie.
  • E(X,Y) E(X)E(Y) if X and Y are independent.
  • Entropy sometimes called uncertainty and
    sometimes information.
  • Uncertainty defined on RV where draws are from
    the set of classes.

10
Majority Function
  • Suppose 2n boolean features.
  • Class defined by n or more features are on.
  • How big is the tree?
  • At least 2n choose n leaves.
  • Prototype Function At least k of n are true is
    a common medical concept.
  • Concepts that are prototypical do not match the
    representational bias of DTS.

11
Dts with real valued attributes
  • Idea convert to solved problem
  • For each real valued attribute f with values v1,
    v2, vn (sorted) and binary features
  • f1lt (v1v2)/2
  • f2 lt (v2v3/2) etc
  • Other approaches possible.
  • E.g. filtany vj so no sorting needed

12
DTs -gtRules (Part)
  • For each leaf, we make a rule by collecting the
    tests to the leaf.
  • Number of rules number of leaves
  • Simplification test each condition on a rule and
    see if dropping it harms accuracy.
  • Can we go from Rules to DTs
  • Not easily. Hint no root.

13
Summary
  • Comprehensible if tree is not large.
  • Effective if small number of features sufficient.
    Bias.
  • Does multi-class problems naturally.
  • Easily generates rules (expert system)
  • And measures of confidence (count)
  • Can be extended for regression.
  • Easy to implement and low complexity
Write a Comment
User Comments (0)
About PowerShow.com