Inductive Learning (1/2) Decision Tree Method - PowerPoint PPT Presentation

About This Presentation

Title:

Inductive Learning (1/2) Decision Tree Method

Description:

Inductive Learning (1/2) Decision Tree Method Russell and Norvig: Chapter 18, Sections 18.1 through 18.4 Chapter 18, Sections 18.1 through 18.3 CS121 Winter 2003 – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 49

Provided by: JeanClaud80

Category:

more less

Transcript and Presenter's Notes

Title: Inductive Learning (1/2) Decision Tree Method

1
Inductive Learning (1/2)Decision Tree Method

Russell and Norvig Chapter 18, Sections 18.1
through 18.4
Chapter 18, Sections 18.1 through 18.3
CS121 Winter 2003

2
Quotes

Our experience of the world is specific, yet we
are able to formulate general theories that
account for the past and predict the future
Genesereth and Nilsson, Logical Foundations of
AI, 1987
Entities are not to be multiplied without
necessityOckham, 1285-1349

3
Learning Agent
Critic
Percepts
Problem solver
Learning element
KB
Actions
4
Contents

Introduction to inductive learning
Logic-based inductive learning
Decision tree method
Version space method
Function-based inductive learning
Neural nets

5
Contents

Introduction to inductive learning
Logic-based inductive learning
Decision tree method
Version space method
why inductive learning works
Function-based inductive learning
Neural nets

1
2
3
6
Inductive Learning Frameworks

Function-learning formulation
Logic-inference formulation

7
Function-Learning Formulation

Goal function f
Training set (xi, f(xi)), i 1,,n
Inductive inference Find a function h that
fits the point well

? Neural nets
8
Logic-Inference Formulation

Background knowledge KB
Training set D (observed knowledge) such that
KB D and KB, D is satisfiable
Inductive inference Find h (inductive
hypothesis) such that
KB, h is satisfiable
KB,h D

h D is a trivial,but uninteresting solution
(data caching)
9
Rewarded Card Example

Deck of cards, with each card designated by
r,s, its rank and suit, and some cards
rewarded
Background knowledge KB ((r1) v v (r10)) ?
NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
(sC)) ? BLACK(s)((sD) v (sH)) ? RED(s)
Training set DREWARD(4,C) ? REWARD(7,C) ?
REWARD(2,S) ?
?REWARD(5,H) ? ?REWARD(J,S)

10
Rewarded Card Example

Background knowledge KB ((r1) v v (r10)) ?
NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
(sC)) ? BLACK(s)((sD) v (sH)) ? RED(s)
Training set DREWARD(4,C) ? REWARD(7,C) ?
REWARD(2,S) ?
?REWARD(5,H) ? ?REWARD(J,S)
Possible inductive hypothesish ? (NUM(r) ?
BLACK(s) ? REWARD(r,s))

There are several possible inductive hypotheses
11
Learning a Predicate

Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD)

Example CONCEPT describes the precondition of
an action, e.g., Unstack(C,A)
E is the set of states
CONCEPT(x) ? HANDEMPTY?x, BLOCK(C) ?x, BLOCK(A)
?x, CLEAR(C) ?x, ON(C,A) ?x
Learning CONCEPT is a step toward learning the
action

12
Learning a Predicate

Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD)
Observable predicates A(x), B(X), (e.g., NUM,
RED)
Training set values of CONCEPT for some
combinations of values of the observable
predicates

13
A Possible Training Set
Ex. A B C D E CONCEPT
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether
an observable predicate A, , E is pertinent or
not
14
Learning a Predicate

Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD)
Observable predicates A(x), B(X), (e.g., NUM,
RED)
Training set values of CONCEPT for some
combinations of values of the observable
predicates
Find a representation of CONCEPT in the form
CONCEPT(x) ? S(A,B, )where
S(A,B,) is a sentence built with the observable
predicates, e.g. CONCEPT(x) ? A(x)
? (?B(x) v C(x))

15
Learning the concept of an Arch
ARCH(x) ? HAS-PART(x,b1) ? HAS-PART(x,b2) ?
HAS-PART(x,b3) ? IS-A(b1,BRICK) ?
IS-A(b2,BRICK) ? ?MEET(b1,b2) ?
(IS-A(b3,BRICK) v
IS-A(b3,WEDGE)) ?
SUPPORTED(b3,b1) ? SUPPORTED(b3,b2)
16
Example set

An example consists of the values of CONCEPT and
the observable predicates for some object x
A example is positive if CONCEPT is True, else
it is negative
The set E of all examples is the example set
The training set is a subset of E

17
Hypothesis Space

An hypothesis is any sentence h of the form
CONCEPT(x) ? S(A,B, )where S(A,B,) is
a sentence built with the observable predicates
The set of all hypotheses is called the
hypothesis space H
An hypothesis h agrees with an example if it
gives the correct value of CONCEPT

18
Inductive Learning Scheme
19
Size of Hypothesis Space

n observable predicates
2n entries in truth table
In the absence of any restriction (bias), there
are hypotheses to choose from
n 6 ? 2x1019 hypotheses!

20
Multiple Inductive Hypotheses
Need for a system of preferences called a bias
to compare possible hypotheses
h1 ? NUM(x) ? BLACK(x) ? REWARD(x) h2 ?
BLACK(r,s) ? ?(rJ) ? REWARD(r,s) h3 ?
(r,s4,C) ? (r,s7,C) ? r,s2,S)
? REWARD(r,s) h3 ? ?(r,s5,H) ?
?(r,sJ,S) ? REWARD(r,s) agree with all the
examples in the training set
21
Keep-It-Simple (KIS) Bias

Motivation
If an hypothesis is too complex it may not be
worth learning it (data caching might just do
the job as well)
There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is
smaller
Examples
Use much fewer observable predicates than
suggested by the training set
Constrain the learnt predicate, e.g., to use only
high-level observable predicates such as NUM,
FACE, BLACK, and RED and/or to have simple syntax
(e.g., conjunction of literals)

If the bias allows only sentences S that
are conjunctions of k ltlt n predicates picked
fromthe n observable predicates, then the size
of H is O(nk)
22
Putting Things Together
23
Predicate-Learning Methods

Decision tree
Version space

24
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree

ExampleA mushroom is poisonous iffit is yellow
and small, or yellow,
big and spotted
x is a mushroom
CONCEPT POISONOUS
A YELLOW
B BIG
C SPOTTED

25
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree

ExampleA mushroom is poisonous iffit is yellow
and small, or yellow,
big and spotted
x is a mushroom
CONCEPT POISONOUS
A YELLOW
B BIG
C SPOTTED
D FUNNEL-CAP
E BULKY

26
Training Set
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
27
Possible Decision Tree
28
Possible Decision Tree
CONCEPT ? (D ? (?E v A)) v
(C ? (B v ((E ? ?A) v A)))
KIS bias ? Build smallest decision tree
Computationally intractable problem? greedy
algorithm
29
Getting Started
The distribution of the training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
30
Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule) with an estimated probability of error
P(E) 6/13
31
Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule)with an estimated probability of error P(E)
6/13
Assuming that we will only include one observable
predicate in the decision tree, which
predicateshould we test to minimize the
probability or error?
32
Assume Its A
33
Assume Its B
34
Assume Its C
35
Assume Its D
36
Assume Its E
So, the best predicate to test is A
37
Choice of Second Predicate
A
F
T
False
C
F
T
The majority rule gives the probability of error
Pr(EA) 1/8and Pr(E) 1/13
38
Choice of Third Predicate
A
F
T
False
C
F
T
True
B
T
F
39
Final Tree
L ? CONCEPT ? A ? (C v ?B)
40
Learning a Decision Tree

DTL(D,Predicates)
If all examples in D are positive then return
True
If all examples in D are negative then return
False
If Predicates is empty then return failure
A ? most discriminating predicate in Predicates
Return the tree whose
- root is A,
- left branch is DTL(DA,Predicates-A),
- right branch is DTL(D-A,Predicates-A)

41
Using Information Theory

Rather than minimizing the probability of error,
most existing learning procedures try to minimize
the expected number of questions needed to decide
if an object x satisfies CONCEPT
This minimization is based on a measure of the
quantity of information that is contained in
the truth value of an observable predicate

42
Miscellaneous Issues

Assessing performance
Training set and test set
Learning curve

43
Miscellaneous Issues

Assessing performance
Training set and test set
Learning curve
Overfitting
Tree pruning

44
Miscellaneous Issues

Assessing performance
Training set and test set
Learning curve
Overfitting
Tree pruning
Cross-validation
Missing data

45
Miscellaneous Issues

Assessing performance
Training set and test set
Learning curve
Overfitting
Tree pruning
Cross-validation
Missing data
Multi-valued and continuous attributes

These issues occur with virtually any learning
method
46
Multi-Valued Attributes
WillWait predicate (Russell and Norvig)
47
Applications of Decision Tree