Learning decision trees - PowerPoint PPT Presentation

About This Presentation

Title:

Learning decision trees

Description:

... derived from the history of credit applications. Learning, page 20 ... Let us consider our credit risk data. There are three feature values ... (CREDIT HISTORY) ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 17

Provided by: siteUo

Category:

more less

Transcript and Presenter's Notes

Title: Learning decision trees

1
Learning decision trees

A concept can be represented as a decision tree,
built from examples, as in this problem of
estimating credit risk by considering four
features of a potential creditor. Such data can
be derived from the history of credit
applications.

2
Learning decision trees (2)

At every level, one feature value is selected.

3
Learning decision trees (3)

Usually many decision trees are possible, with
varying average cost of classification. Not all
features must be included.

4
Learning decision trees (4)

The ID3 algorithm
(its latest industrial-strength implementation is
called C5.0)
If all examples are in the same class, build a
leaf with this class. (If, for example, we have
no historical data that record low or moderate
risk, we can only learn that everything is
high-risk.)
Otherwise, if no more features can be used, build
a leaf with a disjunction of the classes of the
examples. (We might have data that only allow us
to distinguish low risk from high and moderate
risk.)
Otherwise, select a feature for the root
partition the examples of this feature build
recursively the decision trees for all
partitions attach them to the root.
(This is a greedy algorithm a form of hill
climbing.)

5
Learning decision trees (5)

Two partially constructed decisions trees.

6
Learning decision trees (6)

We saw that the same data can be turned into
different trees. The question is what trees are
better.
Essentially, the choice of the feature for the
root is important. We want to select a feature
that gives the most information.
Information in a set of disjoint classes
C c1, ..., cn
is defined by this formula
I(C) S -p(ci) log2 p(ci)
p(ci) is the probability that an example is in
class ci.
The information is measured in bits.

7
Learning decision trees (7)

Let us consider our credit risk data. There are
three feature values in 14 classes.
6 classes have high risk, 3 have moderate risk, 5
have low risk. Assuming uniform distribution,
their probabilities are as follows

8
Learning decision trees (8)

Let feature F be at the root, and let e1, ..., em
be the partitions of the examples on this
feature.
Information needed to build a tree for partition
ei is I(ei).
Expected information needed to build the whole
tree is a weighted average of I(ei).
Let s be the cardinality of set s.
Let ei be the set of all partitions.
Expected information is defined by this formula
E(F) S ei / ei I(ei)

9
Learning decision trees (9)

In our data, there are three partitions based on
income
e1 1, 4, 7, 11, e1 4, I(e1) 0.0
All examples have high risk, so I(e1) -1 log2
1.
e2 2, 3, 12, 14, e2 4, I(e2) 1.0
Two examples have high risk, two have
moderateI(e2) - 1/2 log2 1/2 - 1/2 log2
1/2.
e3 5, 6, 8, 9, 10, 13, e3 6, I(e3) 0.65
I(e3) - 1/6 log2 1/6 - 5/6 log2 5/6.
The expected information to complete the tree
using income as the root feature is this
- 4/14 0.0 - 4/14 1.0 - 6/14 0.65
0.564 bits

10
Learning decision trees (10)

Now we define the information gain from selecting
feature F for tree-building, given a set of
classes C.
G(F) I(C) - E(F)
For our sample data and for F income, we get
this
G(INCOME) I(RISK) - E(INCOME)
1.531 - 0.564 bits 0.967 bits.
Our analysis will be complete, and our choice
clear, after we have similarly considered the
remaining three features. The values are as
follows
G(COLLATERAL) 0.756 bits,
G(DEBT) 0.581 bits,
G(CREDIT HISTORY) 0.266 bits.
That is, we should choose INCOME as the criterion
in the root of the best decision tree that we can
construct.

11
Explanation-based learning

A target concept
The learning system finds an operational
definition of this concept, expressed in terms of
some primitives. The target concept is
represented as a predicate.
A training example
This is an instance of the target concept. It
takes the form of a set of simple facts, not all
of them necessarily relevant to the theory.
A domain theory
This is a set of rules, usually in predicate
logic, that can explain how the training example
fits the target concept.
Operationality criteria
These are the predicates (features) that should
appear in an effective definition of the target
concept.

12
Explanation-based learning (2)

A classic example a theory and an instance of a
cup. A cup is a container for liquids that can be
easily lifted. It has some typical parts, such as
a handle and a bowl, Bowls, the actual
containers, must be concave. Because a cup can be
lifted, it should be light. And so on.
The target concept is cup(X).
The domain theory has five rules.
liftable( X ) ? holds_liquid( X ) ? cup( X )
part( Z, W ) ? concave( W ) ? points_up( W )
? holds_liquid( Z )
light( X ) ? part( X, handle ) ? liftable( X )
small( A ) ? light( A )
made_of( A, feathers ) ? light( A )

13
Explanation-based learning (3)

The training example lists nine facts (some of
them are not relevant).
cup( obj1 ) small( obj1 )
part( obj1, handle ) owns( bob, obj1 )
part( obj1, bottom ) part( obj1, bowl )
points_up( bowl ) concave( bowl )
color( obj1, red )
Operationality criteria require a definition in
terms of structural properties of objects (part,
points_up, small, concave).

14
Explanation-based learning (4)
Step 1 prove the target concept using the
training example
15
Explanation-based learning (5)
Step 2 generalize the proof. Constants from the
domain theory, for example handle, are not
generalized.
16
Explanation-based learning (6)
Step 3 Take the definition off the tree,only
the root and the leaves.
In our example, we get this rule small( X )
?part( X, handle ) ?part( X, W ) ?concave( W )
?points_up( W ) ? cup( X )

Write a Comment

User Comments (0)