Learning decision trees - PowerPoint PPT Presentation

About This Presentation
Title:

Learning decision trees

Description:

... derived from the history of credit applications. Learning, page 20 ... Let us consider our credit risk data. There are three feature values ... (CREDIT HISTORY) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 17
Provided by: siteUo
Category:

less

Transcript and Presenter's Notes

Title: Learning decision trees


1
Learning decision trees
  • A concept can be represented as a decision tree,
    built from examples, as in this problem of
    estimating credit risk by considering four
    features of a potential creditor. Such data can
    be derived from the history of credit
    applications.

2
Learning decision trees (2)
  • At every level, one feature value is selected.

3
Learning decision trees (3)
  • Usually many decision trees are possible, with
    varying average cost of classification. Not all
    features must be included.

4
Learning decision trees (4)
  • The ID3 algorithm
  • (its latest industrial-strength implementation is
    called C5.0)
  • If all examples are in the same class, build a
    leaf with this class. (If, for example, we have
    no historical data that record low or moderate
    risk, we can only learn that everything is
    high-risk.)
  • Otherwise, if no more features can be used, build
    a leaf with a disjunction of the classes of the
    examples. (We might have data that only allow us
    to distinguish low risk from high and moderate
    risk.)
  • Otherwise, select a feature for the root
    partition the examples of this feature build
    recursively the decision trees for all
    partitions attach them to the root.
  • (This is a greedy algorithm a form of hill
    climbing.)

5
Learning decision trees (5)
  • Two partially constructed decisions trees.

6
Learning decision trees (6)
  • We saw that the same data can be turned into
    different trees. The question is what trees are
    better.
  • Essentially, the choice of the feature for the
    root is important. We want to select a feature
    that gives the most information.
  • Information in a set of disjoint classes
  • C c1, ..., cn
  • is defined by this formula
  • I(C) S -p(ci) log2 p(ci)
  • p(ci) is the probability that an example is in
    class ci.
  • The information is measured in bits.

7
Learning decision trees (7)
  • Let us consider our credit risk data. There are
    three feature values in 14 classes.
  • 6 classes have high risk, 3 have moderate risk, 5
    have low risk. Assuming uniform distribution,
    their probabilities are as follows

8
Learning decision trees (8)
  • Let feature F be at the root, and let e1, ..., em
    be the partitions of the examples on this
    feature.
  • Information needed to build a tree for partition
    ei is I(ei).
  • Expected information needed to build the whole
    tree is a weighted average of I(ei).
  • Let s be the cardinality of set s.
  • Let ei be the set of all partitions.
  • Expected information is defined by this formula
  • E(F) S ei / ei I(ei)

9
Learning decision trees (9)
  • In our data, there are three partitions based on
    income
  • e1 1, 4, 7, 11, e1 4, I(e1) 0.0
  • All examples have high risk, so I(e1) -1 log2
    1.
  • e2 2, 3, 12, 14, e2 4, I(e2) 1.0
  • Two examples have high risk, two have
    moderateI(e2) - 1/2 log2 1/2 - 1/2 log2
    1/2.
  • e3 5, 6, 8, 9, 10, 13, e3 6, I(e3) 0.65
  • I(e3) - 1/6 log2 1/6 - 5/6 log2 5/6.
  • The expected information to complete the tree
    using income as the root feature is this
  • - 4/14 0.0 - 4/14 1.0 - 6/14 0.65
    0.564 bits

10
Learning decision trees (10)
  • Now we define the information gain from selecting
    feature F for tree-building, given a set of
    classes C.
  • G(F) I(C) - E(F)
  • For our sample data and for F income, we get
    this
  • G(INCOME) I(RISK) - E(INCOME)
  • 1.531 - 0.564 bits 0.967 bits.
  • Our analysis will be complete, and our choice
    clear, after we have similarly considered the
    remaining three features. The values are as
    follows
  • G(COLLATERAL) 0.756 bits,
  • G(DEBT) 0.581 bits,
  • G(CREDIT HISTORY) 0.266 bits.
  • That is, we should choose INCOME as the criterion
    in the root of the best decision tree that we can
    construct.

11
Explanation-based learning
  • A target concept
  • The learning system finds an operational
    definition of this concept, expressed in terms of
    some primitives. The target concept is
    represented as a predicate.
  • A training example
  • This is an instance of the target concept. It
    takes the form of a set of simple facts, not all
    of them necessarily relevant to the theory.
  • A domain theory
  • This is a set of rules, usually in predicate
    logic, that can explain how the training example
    fits the target concept.
  • Operationality criteria
  • These are the predicates (features) that should
    appear in an effective definition of the target
    concept.

12
Explanation-based learning (2)
  • A classic example a theory and an instance of a
    cup. A cup is a container for liquids that can be
    easily lifted. It has some typical parts, such as
    a handle and a bowl, Bowls, the actual
    containers, must be concave. Because a cup can be
    lifted, it should be light. And so on.
  • The target concept is cup(X).
  • The domain theory has five rules.
  • liftable( X ) ? holds_liquid( X ) ? cup( X )
  • part( Z, W ) ? concave( W ) ? points_up( W )
    ? holds_liquid( Z )
  • light( X ) ? part( X, handle ) ? liftable( X )
  • small( A ) ? light( A )
  • made_of( A, feathers ) ? light( A )

13
Explanation-based learning (3)
  • The training example lists nine facts (some of
    them are not relevant).
  • cup( obj1 ) small( obj1 )
  • part( obj1, handle ) owns( bob, obj1 )
  • part( obj1, bottom ) part( obj1, bowl )
  • points_up( bowl ) concave( bowl )
  • color( obj1, red )
  • Operationality criteria require a definition in
    terms of structural properties of objects (part,
    points_up, small, concave).

14
Explanation-based learning (4)
Step 1 prove the target concept using the
training example
15
Explanation-based learning (5)
Step 2 generalize the proof. Constants from the
domain theory, for example handle, are not
generalized.
16
Explanation-based learning (6)
Step 3 Take the definition off the tree,only
the root and the leaves.
In our example, we get this rule small( X )
?part( X, handle ) ?part( X, W ) ?concave( W )
?points_up( W ) ? cup( X )
Write a Comment
User Comments (0)
About PowerShow.com