Machine Learning: Rule Learning presentation

About This Presentation

Transcript and Presenter's Notes

Title: Machine Learning: Rule Learning

1
Machine LearningRule Learning

Intelligent Systems Lab.
Soongsil University

Thanks to Raymond J. Mooney in the University of
Texas at Austin
2
Introduction

Set of If-then rules
The hypothesis is easy to interpret.
Goal
Look at a new method to learn rules
Rules
Propositional rules (rules without variables)
First-order predicate rules (with variables)

3
Introduction 2

GOAL Learning a target function as a set of
IF-THEN rules
BEFORE Learning with decision trees
Learning the decision tree
Translate the tree into a set of IF-THEN rules
(for each leaf one rule)
OTHER POSSIBILITY Learning with genetic
algorithms
Each set of rule is coded as a bitvector
Several genetic operators are used on the
hypothesis space
TODAY AND HERE
First Learning rules in propositional form
Second Learning rules in first-order form (Horn
clauses which include variables)
Sequential search for rules, one after the other

4
Rule induction

To learn a set of IF-THEN rules for
classification
- Suitable when the target function can be
represented
by a set of IF-THEN rules
Target function h Rule1,Rule2,...,Rulem
Rulej IF(precondj1 ? precondj2 ?...?
precondjn)
THEN postcondj
IF-THEN rules
An expressive representation
Most readable and understandable for
human

5
Rule induction Example (1)

Learning a set of propositional rules
E.g., The target function (concept)
Buy_Computer is represented by
IF (AgeOld ? StudentNo) THEN
Buy_ComputerNo
IF (StudentYes) THEN Buy_ComputerYes
IF (AgeMedium ? IncomeHigh) THEN
Buy_ComputerYes
Learning a set of first-order rules
E.g., The target function (concept) Ancestor
is represented by
IF Parent(x,y) THEN Ancestor(x,y)
IF Parent(x,y) ? Ancestor(y,z) THEN
Ancestor(x,z)
(Parent(x,y) is a predicate saying that y is
the father/mother of x)

6
Rule induction Example (2)

Rule IF(AgeOld ? StudentNo)THEN
Buy_ComputerNo
Which instances are correctly classified by the
above rule?

7
Learning Rules

If-then rules in logic are a standard
representation of knowledge that have proven
useful in expert-systems and other AI systems
In propositional logic a set of rules for a
concept is equivalent to DNF
Methods for automatically inducing rules from
data have been shown to build more accurate
expert systems than human knowledge engineering
for some applications.
Rule-learning methods have been extended to
first-order logic to handle relational
(structural) representations.
Inductive Logic Programming (ILP) for learning
Prolog programs from I/O pairs.
Allows moving beyond simple feature-vector
representations of data.

8
Rule Learning Approaches

Translate decision trees into rules (C4.5)
Sequential (set) covering algorithms
General-to-specific (top-down) (RIPPER, CN2,
FOIL)
Specific-to-general (bottom-up) (GOLEM, CIGOL)
Hybrid search (AQ, Chillin, Progol)
Translate neural-nets into rules (TREPAN)

9
Decision-Trees to Rules

For each path in a decision tree from the root to
a leaf, create a rule with the conjunction of
tests along the path as an antecedent and the
leaf label as the consequent.

red ? circle ? A blue ? B red ? square ? B green
? C red ? triangle ? C
color
green
red
blue
shape
B
C
circle
triangle
square
B
C
A
10
Post-Processing Decision-Tree Rules

Resulting rules may contain unnecessary
antecedents that are not needed to remove
negative examples and result in over-fitting.
Rules are post-pruned by greedily removing
antecedents or rules until performance on
training data or validation set is significantly
harmed.
Resulting rules may lead to competing conflicting
conclusions on some instances.
Sort rules by training (validation) accuracy to
create an ordered decision list. The first rule
in the list that applies is used to classify a
test instance.

red ? circle ? A (97 train accuracy) red ?
big ? B (95 train accuracy) Test case
ltbig, red, circlegt assigned to class A
11
Propositional rule induction Training(Sequentia
l Covering Algorithms)

To learn the set of rules in a sequential
(incremental) covering strategy
- Step 1. Learn one rule
- Step 2. Remove from the training set those
instances correctly classified
by the rule
? Repeat the Steps 1, 2 to learn another
rule (using the remaining training
set)
The learning procedure
- Learns (i.e., covers) the rules
sequentially (incrementally)
- Can be repeated as many times as wanted to
learn a set of rules that cover a desired portion
of (or the full) the training set
The set of learned rules is sorted according to
some performance measure (e.g., classification
accuracy)
?The rules will be referred in this order
when classifying a future
instance

12
Propositional rule induction Classification

Given a test instance
The learned rules are tried (tested) sequentially
(i.e., rule by rule) in the order resulted in the
training phase
The first encountered rule that covers the
instance (i.e., the rules preconditions in the
IF clause match the instance) classifies it
The instance is classified by the post-condition
in the rules THEN clause
If no rule covers the instance, then the instance
is classified by the default rule
The instance is classified by the most frequent
value of the target attribute in the training
instances

13
Sequential Covering Algorithms

Require that each rule has high accuracy but low
coverage
High accuracy ? the correct prediction
Accepting low coverage ? the prediction NOT
necessary for every training example

14
Rule Coverage and Accuracy

Coverage of a rule
Fraction of records that satisfy the antecedent
of a rule
Accuracy of a rule
Fraction of records that satisfy both the
antecedent and consequent of a rule

Rule IF (StatusSingle) Then No Coverage
40 (4/10), Accuracy 50 (2/4)
15
Sequential Covering Algorithms (cont.)
16
Sequential Covering

A set of rules is learned one at a time, each
time finding a single rule that covers a large
number of positive instances without covering any
negatives, removing the positives that it covers,
and learning additional rules to cover the rest.
This is an instance of the greedy algorithm for
minimum set covering and does not guarantee a
minimum number of learned rules.
Minimum set covering is an NP-hard problem and
the greedy algorithm is a standard approximation
algorithm.
Methods for learning individual rules vary.

17
?? Set Covering Problem

Let S 1, 2, , n, and suppose Sj ? S for each
j. We say that index set J is a cover of S if ?j
? J ? Sj S
Set covering problem find a minimum cardinality
set cover of S.
Application to locating 119-fire stations.
Locating hospitals.
Locating Starbucksand many non-obvious
applications.

18
Greedy Sequential Covering Example
Y

X
19
Greedy Sequential Covering Example
Y

X
20
Greedy Sequential Covering Example
Y

X
21
Greedy Sequential Covering Example
Y

X
22
Greedy Sequential Covering Example
Y

X
23
Greedy Sequential Covering Example
Y

X
24
Greedy Sequential Covering Example
Y
X
25
No-optimal Covering Example
Y

X
26
Greedy Sequential Covering Example
Y

X
27
Greedy Sequential Covering Example
Y

X
28
Greedy Sequential Covering Example
Y

X
29
Greedy Sequential Covering Example
Y

X
30
Greedy Sequential Covering Example
Y

X
31
Greedy Sequential Covering Example
Y

X
32
Greedy Sequential Covering Example
Y

X
33
Greedy Sequential Covering Example
Y
X
34
Strategies for Learning a Single Rule

Top Down (General to Specific)
Start with the most-general (empty) rule.
Repeatedly add antecedent constraints on features
that eliminate negative examples while
maintaining as many positives as possible.
Stop when only positives are covered.
Bottom Up (Specific to General)
Start with a most-specific rule (e.g. complete
instance description of a random instance).
Repeatedly remove antecedent constraints in order
to cover more positives.
Stop when further generalization results in
covering negatives.

35
Sequential Covering Algorithms (cont.)

General to Specific Beam Search
How do we learn each individual rule?
Requirements for LEARN-ONE-RULE
High accuracy, need not high coverage
One approach is . . .
To implement LEARN-ONE-RULE in similar way as in
decision tree learning (ID3), but to follow only
the most promising branch in the tree at each
step.
As illustrated in the figure, the search begins
by considering the most general rule precondition
possible (the empty test that matches every
instance), then greedily adding the attribute
test that most improves rule performance over
training examples.

36
Sequential Covering Algorithms (cont.)

Specialising search
Organises a hypothesis space search in general
the same fashion as the ID3, but follows only
the most promising branch of the tree at each
step
1. Begin with the most general rule (no/empty
precondition)
2. Follow the most promising branch
Greedily adding the attribute test that most
improves the measured performance of the rule
over the training example
3. Greedy depth-first search with no backtracking
Danger of sub-optimal choice
Reduce the risk Beam Search (CN2-algorithm)Algor
ithm maintains the list of the k best candidates
In each search step, descendants are generated
for each of these k-best candidatesThe resulting
set is then reduced to the k most promising
members

37
Sequential Covering Algorithms (cont.)

General to Specific Beam Search

38
Sequential Covering Algorithms (cont.)

Variations
Learn only rules that cover positive examples
In the case that the fraction of positive example
is small
In this case, we can modify the algorithm to
learn only from those rare example, and classify
anything not covered by any rule as negative.
Instead of entropy, use a measure that evaluates
the fraction of positive examples covered by the
hypothesis

39
Top-Down Rule Learning Example
Y

X
40
Top-Down Rule Learning Example
Y

C1ltY

X
41
Top-Down Rule Learning Example
Y

C1ltY

X
C2 ltX
42
Top-Down Rule Learning Example
Y
YltC3

C1ltY

X
C2 ltX
43
Top-Down Rule Learning Example
Y
YltC3

C1ltY

X
XltC4
C2 ltX
44
Bottom-Up Rule Learning Example
Y

X
45
Bottom-Up Rule Learning Example
Y

X
46
Bottom-Up Rule Learning Example
Y

X
47
Bottom-Up Rule Learning Example
Y

X
48
Bottom-Up Rule Learning Example
Y

X
49
Bottom-Up Rule Learning Example
Y

X
50
Bottom-Up Rule Learning Example
Y

X
51
Bottom-Up Rule Learning Example
Y

X
52
Bottom-Up Rule Learning Example
Y

X
53
Bottom-Up Rule Learning Example
Y

X
54
Bottom-Up Rule Learning Example
Y

X
55
Sequential Covering Algorithms (cont.)

General to Specific Beam Search
Greedy search without backtracking
? danger of suboptimal choice at any step
The algorithm can be extended using beam-search
Keep a list of the k best candidates at each step
On each search step, descendants are generated
for each of these k best candidates and the
resulting set is again reduced to the k best
candidates.

56
Learning Rule Sets Summary

Key design issue for learning sets of rules
Sequential or Simultaneous?
Sequential learning one rule at a time,
removing the covered examples and repeating the
process on the remaining examples
Simultaneous learning the entire set of
disjuncts simultaneously as part of the single
search for an acceptable decision tree as in ID3
General-to-specific or Specific-to-general?
G?S Learn-One-Rule
S?G Find-S
Generate-and-test or Example-driven?
GT search thru syntactically legal hypotheses
E-D Find-S, Candidate-Elimination
Post-pruning of Rules?
Similar method to the one discussed in decision
tree learning

57
Measure performance of a rule (1)

Relative frequency
D_trainR The set of the training instances that
match the preconditions of rule R
n of examples matched by rule, i.e, size of
D_trainR
nc of examples classified by rule correctly

58
Measure performance of a rule (2)

m-estimate of accuracy
p The prior probability that an instance,
randomly drawn from the entire
dataset, will have the classification assigned by
rule R
?p is the prior assumed accuracy (?, 100??
example ? 12?? example? ????? p0.12)
m A weight that indicates how much the prior
probability p
influences the rule performance measure
- If m0, then m-estimate becomes the
relative frequency measure
- As m increases, a larger number of
instances is needed to override
the prior assumed accuracy p

59
Measure performance of a rule (3)

Entropy measure
c The number of possible values (i.e., classes)
of the target
attribute
pi The proportion of instances in D_trainR for
which the target attribute takes on the i-th
value (i.e., class)

60
Sequential covering algorithms Issues

Reduce the (more difficult) problem of learning a
disjunctive set of rules to a sequence of
(simpler) problems, each is to learn a single
conjunctive rule
After a rule is learned (i.e., found) the
training instances covered(classified) by the
rule are removed from the training set
Each rule may not be treated as independent of
other rules
Perform a greedy search (for finding a sequence
of rules) without backtracking
Not guaranteed to find the smallest set of rules
Not guaranteed to find the best set of rules

61
Learning First-Order Rules

From now . . .
We consider learning rule that contain variables
(first-order rules)
Inductive learning of first-order rules
inductive logic programming (ILP)
Can be viewed as automatically inferring Prolog
programs
Two methods are considered
FOIL
Induction as inverted deduction

62
Learning First-Order Rules (cont.)

First-order rule
Rules that contain variables
Example
Ancestor (x, y) ? Parent (x, y).
Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
y) recursive
More expressive than propositional rules
IF (Father1 Bob) (Name2 Bob) (Female1
True),
THEN Daughter1,2 True
IF Father(y,x) Female(y), THEN Daughter(x,y)

63
Learning First-Order Rules (cont.)

Formal definitions in first-order logic
Constants e.g., John, Kansas, 42
Variables e.g., Name, State, x
Predicates e.g., Male, as in Male(John)
Functions e.g., age,cosine as in,age(Gunho),
cosine(x)
Term constant, variable, or function(term)
Literal is a predicate (or its negation) applied
to a set of terms, e.g., Greater_Than(age(John),20
), Male(x), etc.
A first-order rule is a Horn clause
H and Li(i1..n) are literals
Clause disjunction of literals with implicit
universal quantification
Horn clause at most one positive literal
(H ? ?L1 ? ?L2 ? ? ?Ln)

64
Learning First-Order Rules (cont.)

First Order Horn Clauses
Rules that have one or more preconditions and one
single consequent. Predicates may have variables
The following Horn clause is equivalent
H ? ?L1 ? ? ? Ln
H ? (L1 ? ? Ln )
IF (L1 ? Ln), THEN H

65
Learning Sets of First-Order Rules FOIL

First-Order Inductive Learning (FOIL)
Natural extension of Sequential covering
Learn-one-rule
FOIL rule similar to Horn clause with two
exceptions
Syntactic restriction no function
More expressive than Horn clauses
Negation allowed in rule bodies

66
Learning a Single Rule in FOIL

Top-down approach originally applied to
first-order logic (Quinlan, 1990).
Basic algorithm for instances with
discrete-valued features
Let A (set of rule antecedents)
Let N be the set of negative examples
Let P the current set of uncovered positive
examples
Until N is empty do
For every feature-value pair (literal)
(FiVij) calculate
Gain(FiVij, P, N)
Pick literal, L, with highest gain.
Add L to A.
Remove from N any examples that do not
satisfy L.
Remove from P any examples that do not
satisfy L.
Return the rule A1 ?A2 ? ?An ? Positive

67
Learning first-order rules FOIL alg.
68
Sequential (set) covering algorithms

CN2 Algorithm
Start from an empty conjunct
Add conjuncts that minimizes the entropy measure
A, A,B,
Determine the rule consequent by taking majority
class of instances covered by the rule
RIPPER Algorithm
Start from an empty rule R0 gt class(initial
rule)
Add conjuncts that maximizes FOILs information
gain measure

R1 A gt class (rule after adding conjunct)
t number of positive instances covered by both
R0 and R1
p0 number of positive instances covered by R0
n0 number of negative instances covered by R0
p1 number of positive instances covered by R1
n1 number of negative instances covered by R1

69
Foil Gain Metric

Want to achieve two goals
Decrease coverage of negative examples
Measure increase in percentage of positives
covered when literal is added to the rule.
Maintain coverage of as many positives as
possible.
Count number of positives covered.

70
Foil_Gain measure

R0 gt class (initial rule)
R1 A gt class (rule after adding conjunct)
t number of positive instances covered by both
R0 and R1
p0 number of positive instances covered by R0
n0 number of negative instances covered by R0
p1 number of positive instances covered by R1
n1 number of negative instances covered by R1

71
Example Foil_Gain measure

R0 ? (initial rule)
R1 A ? class (rule after adding conjunct)
R2 B ? class (rule after adding conjunct)
R3 C ? class (rule after adding conjunct)
Assume the initial rule is gt.
This rule covers p0 100 positive examples and
n0 400 negative examples.
Choose one rule !!
R1 covers 4 positive examples and 1 negative
example.
R2 covers 30 positive examples and 10 negative
example.
R3 covers 100 positive examples and 90 negative
examples.

72
Example Foil_Gain measure

R0 gt . (initial rule)
This rule covers 100 positive examples and
400 negative examples.
R1 A gt class (rule after adding conjunct)
R1 covers 4 positive examples and 1 negative
example.
t 4, number of positive instances covered by
both R0 and R1
p0 100
n0 400
p1 4
n1 1

73
Example Foil_Gain measure

R0 ? . (initial rule)
This rule covers 100 positive examples
and
400 negative examples.
R2 B ? (rule after adding conjunct)
R2 covers 30 positive examples and 10 negative
example..
t 30, number of positive instances covered by
both R0 and R1
p0 100
n0 400
p1 30
n1 10

74
Example Foil_Gain measure

R0 gt . (initial rule)
This rule covers 100 positive examples
and
400 negative examples.
R3 C gt (rule after adding conjunct)
R3 covers 100 positive examples and 90 negative
example..
t 100, number of positive instances covered by
both R0 and R1
p0 100
n0 400
p1 100
n1 90

75
Example

Training data assertions
GrandDaughter(Victor, Sharon)
Female(Sharon) Father(Sharon,Bob)
Father(Tom, Bob) Father(Bob, Victor)
Use closed world assumption any literal
involving the specified predicates and literals
that is not listed is assumed to be false
?GrandDaughter(Tom,Bob) ?GrandDaughter(Tom,Tom)
?GrandDaughter(Bob,Victor)
?Female(Tom) etc.

76
Possible Variable Bindings

Initial rule
GrandDaughter(x,y)
Possible bindings from training assertions (how
many possible bindings of 4 literals to initial
rule?)
Positive binding x/Victor, y/Sharon
Negative bindings x/Victor, y/Victor
x/Tom, y/Sharon, etc.
Positive bindings provide positive evidence and
negative bindings provide negative evidence
against the rule under consideration.

77
Rule Pruning in FOIL

Prepruning method based on minimum description
length (MDL) principle.
Postpruning to eliminate unnecessary complexity
due to limitations of greedy algorithm.
For each rule, R
For each antecedent, A, of rule
If deleting A from R does not
cause
negatives to become covered
then delete A
For each rule, R
If deleting R does not uncover any
positives (since they
are redundantly covered by other
rules)
then delete R

78
Sequential (set) covering algorithms

CN2 Algorithm
Start from an empty conjunct
Add conjuncts that minimizes the entropy measure
A, A,B,
Determine the rule consequent by taking majority
class of instances covered by the rule
RIPPER Algorithm
Start from an empty rule R0 gt class(initial
rule)
Add conjuncts that maximizes FOILs information
gain measure

R1 A gt class (rule after adding conjunct)
t number of positive instances covered by both
R0 and R1
p0 number of positive instances covered by R0
n0 number of negative instances covered by R0
p1 number of positive instances covered by R1
n1 number of negative instances covered by R1

79
General to Specific Beam Search

Learning with decision tree

80
General to Specific Beam Search 4
The CN2-Algorithm
LearnOneRule( target_attribute, attributes,
examples, k ) Initialise best_hypothesis Ø ,
the most general hypothesis Initialise
candidate_hypotheses ? best_hypothesis while
( candidate_hypothesis is not empty ) do 1.
Generate the next more-specific
candidate_hypothesis 2. Update
best_hypothesis 3. Update candidate_hypothesis
return a rule of the form IF best_hypothesis
THEN prediction where prediction is the most
frequent value of target_attribute among those
examples that match best_hypothesis. Performance
( h, examples, target_attribute ) h_examples
the subset of examples that match h return
-Entropy( h_examples ), where Entropy is with
respect to target_attribute
81
General to Specific Beam Search 5

Generate the next more specific
candidate_hypothesis

all_constraints set of all constraints (a
v), where a Î attributes and v is a value
of a occuring in the current set of
examples new_candidate_hypothesis for each h
in candidate_hypotheses, for each c
in all_constraints Create
a specialisation of h by adding the constraint c
Remove from new_candidate_hypothesis
any hypotheses which are
duplicate, inconsistent or not maximally specific

Update best_hypothesis

for all h in new_candidate_hypothesis do if
statistically significant when tested on
examples Performance( h, examples,
target_attribute ) gt
Performance( best_hypothesis, examples,
target_attribute ) )
then best_hypothesis h
82
General to Specific Beam Search 6

Update the candidate-hypothesis

candidate_hypothesis the k best members of
new_candidate_hypothesis, according to
Performance function

Performance function guides the search in the
Learn-One -Rule
s the current set of training examples
c the number of possible values of the target
attribute
part of the examples, which are classified
with the ith. value

83
Example for CN2-Algorithm
LearnOneRule(EnjoySport, Sky, AirTemp, Humidity,
Wind, Water, Forecast, EnjoySport, examples, 2)
best_hypothesis Ø candidate_hypotheses
Ø all_constraints SkySunny, SkyRainy,
AirTempWarm, AirTempCold,
HumidityNormal, HumidityHigh,
WindStrong,
WaterWarm, WaterCool,
ForecastSame, ForecastChange Performance
nc / n n Number of examples, covered by the
rule nc Number of examples covered by the rule
and classification is correct
84
Example for CN2-Algorithm (2)
Pass 1 Remove delivers no result
candidate_hypotheses SkySunny,
AirTempWarm best_hypothesis is SkySunny
85
Example for CN2-Algorithm (3)
Pass 2 Remove (duplicate, inconsistent, not
maximally specific)
candidate_hypotheses SkySunny AND
AirTempWarm, SkySunny AND HumidityHigh best_h
ypothesis remains SkySunny
86
Relational Learning andInductive Logic
Programming (ILP)

Fixed feature vectors are a very limited
representation of instances.
Examples or target concept may require relational
representation that includes multiple entities
with relationships between them (e.g. a graph
with labeled edges and nodes).
First-order predicate logic is a more powerful
representation for handling such relational
descriptions.
Horn clauses (i.e. if-then rules in predicate
logic, Prolog programs) are a useful restriction
on full first-order logic that allows decidable
inference.
Allows learning programs from sample I/O pairs.

87
ILP Examples

Learn definitions of family relationships given
data for primitive types and relations.
parent(A,B) - brother(A,C), parent(C,B).
uncle(A,B) - husband(A,C), sister(C,D),
parent(D,B).
Learn recursive list programs from I/O pairs.
member(X, X Y).
member(X, Y Z) - member(X,Z).
append( ,L,L).
append(XL1,L2,XL12)- append(L1,L2,L12).

88
ILP

Goal is to induce a Horn-clause definition for
some target predicate P, given definitions of a
set of background predicates Q.
Goal is to find a syntactically simple
Horn-clause definition, D, for P given background
knowledge B defining the background predicates Q.
For every positive example pi of P
For every negative example ni of P
Background definitions are provided either
Extensionally List of ground tuples satisfying
the predicate.
Intensionally Prolog definitions of the
predicate.

89
ILP Systems

Top-Down
FOIL (Quinlan, 1990)
Bottom-Up
CIGOL (Muggleton Buntine, 1988)
GOLEM (Muggleton, 1990)
Hybrid
CHILLIN (Mooney Zelle, 1994)
PROGOL (Muggleton, 1995)
ALEPH (Srinivasan, 2000)

90
FOILFirst-Order Inductive Logic

Top-down sequential covering algorithm upgraded
to learn Prolog clauses, but without logical
functions.
Background knowledge must be provided
extensionally.
Initialize clause for target predicate P to
P(X1,.XT) -.
Possible specializations of a clause include
adding all possible literals
Qi(V1,,VTi)
not(Qi(V1,,VTi))
Xi Xj
not(Xi Xj)
where Xs are bound variables already in
the existing clause at least one of V1,,VTi is
a bound variable, others can be new.
Allow recursive literals P(V1,,VT) if they do
not cause an infinite regress.
Handle alternative possible values of new
intermediate variables by maintaining examples as
tuples of all variable values.

91
FOIL Training Data

For learning a recursive definition, the positive
set must consist of all tuples of constants that
satisfy the target predicate, given some fixed
universe of constants.
Background knowledge consists of complete set of
tuples for each background predicate for this
universe.
Example Consider learning a definition for the
target predicate path for finding a path in a
directed acyclic graph.
path(X,Y) - edge(X,Y).
path(X,Y) - edge(X,Z), path(Z,Y).

Horn-clause definition, D,
some target predicate P
Background Knowledge, B edge lt1,2gt,lt1,3gt,lt3,6gt,
lt4,2gt,lt4,6gt,lt6,5gt path lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,
lt3,6gt,lt3,5gt, lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
92
FOIL Negative Training Data

Negative examples of target predicate can be
provided directly, or generated indirectly by
making a closed world assumption.
Every pair of constants ltX,Ygt not in positive
tuples for path predicate.

Negative path tuples path(X, Y)? ltX,
Ygt lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2,6
gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4gt,lt5,1
gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,lt6,2gt,lt6,3
gt, lt6,4gt,lt6,6gt
93
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Start with clause path(X,Y)-. Possible
literals to add edge(X,X),edge(Y,Y),edge(X,Y),edg
e(Y,X),edge(X,Z), edge(Y,Z),edge(Z,X),edge(Z,Y),p
ath(X,X),path(Y,Y), path(X,Y),path(Y,X),path(X,Z),
path(Y,Z),path(Z,X), path(Z,Y),XY, plus
negations of all of these.
94
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,X).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 6 negative examples
Not a good literal.
95
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
96
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
Remove covered positive tuples !!.
97
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
path(X,Y)- edge(X,X). Start new
clause path(X,Y)-.
98
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 0 negative examples
Not a good literal.
99
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 14 of 26 negative examples
Eventually chosen as best possible literal
100
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples !!
101
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
102
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
103
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
104
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
105
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values.
106
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Continue specializing clause path(X,Y)-
edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
107
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z),edge(Z,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 3 positive examples
Covers 0 negative examples
108
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z), path(Z,Y).
Covers 4 positive examples
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 negative examples
Eventually chosen as best literal completes
clause.
Definition complete, since all original ltX,Ygt
tuples are covered (by way of covering some
ltX,Y,Zgt tuple.)
109
Logic Program Induction in FOIL

FOIL has also learned
append given components and null
reverse given append, components, and null
quicksort given partition, append, components,
and null
Other programs from the first few chapters of a
Prolog text.
Learning recursive programs in FOIL requires a
complete set of positive examples for some
constrained universe of constants, so that a
recursive call can always be evaluated
extensionally.
For lists, all lists of a limited length composed
from a small set of constants (e.g. all lists up
to length 3 using a,b,c).
Size of extensional background grows
combinatorially.
Negative examples usually computed using a
closed-world assumption.
Grows combinatorially large for higher arity
target predicates.
Can randomly sample negatives to make tractable.

110
More Realistic Applications

Classifying chemical compounds as mutagenic
(cancer causing) based on their graphical
molecular structure and chemical background
knowledge.
Classifying web documents based on both the
content of the page and its links to and from
other pages with particular content.
A web page is a university faculty home page if
It contains the words Professor and
University, and
It is pointed to by a page with the word
faculty, and
It points to a page with the words course and
exam

111
FOIL Limitations

Search space of literals (branching factor) can
become intractable.
Use aspects of bottom-up search to limit search.
Requires large extensional background
definitions.
Use intensional background via Prolog inference.
Hill-climbing search gets stuck at local optima
and may not even find a consistent clause.
Use limited backtracking (beam search)
Include determinate literals with zero gain.
Use relational pathfinding or relational clichés.
Requires complete examples to learn recursive
definitions.
Use intensional interpretation of learned
recursive clauses.

112
Rule Learning and ILP Summary

There are effective methods for learning symbolic
rules from data using greedy sequential covering
and top-down or bottom-up search.
These methods have been extended to first-order
logic to learn relational rules and recursive
Prolog programs.
Knowledge represented by rules is generally more
interpretable by people, allowing human insight
into what is learned and possible human approval
and correction of learned knowledge.

113
Induction as Inverted Deduction

Induction is finding h such that
(?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
where
xi is the ith training instance
f(xi) is the target function value for xi
B is other background knowledge
So lets design inductive algorithms by inverting
operators for automated deduction

114
Induction as Inverted Deduction

pairs of people, ltu,vgt such that child of u is
v,
f(xi) Child(Bob,Sharon)
xi Male(Bob),Female(Sharon),Father(Sharon,Bob)
B Parent(u,v) ? Father(u,v)
What satisfies (?lt xi,f(xi)gt ? D) B ? h ? xi
f(xi) ?
h1 Child(u,v) ? Father(v,u) ?
h1 ? xi with no need B
h2 Child(u,v) ? Parent(v,u) ? B ?
h1 ? xi

D
115
Induction as Inverted Deduction
We have mechanical deductive operators F(A,B)
C, where A ? B C need inductive
operators O(B,D) h where (?ltxi,f(xi)gt
? D) B ? h ? xi f(xi)
116
Induction as Inverted Deduction

Positives
Subsumes earlier idea of finding h that fits
training data
Domain theory B helps define meaning of fit the
data
B ? h ? xi f(xi)
Negatives
Doesnt allow for noisy data. Consider
(?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
First order logic gives a huge hypothesis space H
overfitting
intractability of calculating all acceptable
hs

117
Deduction Resolution Rule

C1 P ? L
C2 L ? R
P ? R
1. Given initial clauses C1 and C2, find a
literal L from clause C1 such that L occurs in
clause C2.
2. Form the resolvent C by including all literals
from C1 and C2, except for L and L.
More precisely, the set of literals occurring
in the conclusion C is
C (C1 - L) ? (C2 - L)
where ? denotes set union, and - set
difference.

118
Inverting Resolution

C (C1 - L) ? (C2 - L)
C1 PassExam ? KnowMaterial C2
KnowMaterial ? Study
C
PassExam ? Study

119
Inverted Resolution (Propositional)

Given initial clauses C1 and C, find a literal L
that occurs in clause C1, but not in clause C.
Form the second clause C2 by including the
following literals
C2 (C - (C1 - L)) ? L

C1 PassExam ? KnowMaterial C2
KnowMaterial ? Study

C PassExam ? Study
120
Inverted Resolution

First Order Resolution
substitution
to be any mapping of variables to terms.
ex) ? x / Bob, y / z
Unifying Substitution
For two literal L1 and L2, provided L1? L2?
Ex) ? x/Bill, z/y
L1Father(x, y), L2Father(Bill, z)
L1? L2? Father(Bill, y)

121
Inverted Resolution

First Order Resolution
Resolution Operator (first-order form)
Find a literal L1 from clause C1, literal L2 from
clause C2, and substitution ? such that L1?
?L2?.
From the resolvent C by including all literals
from C1? and C2?, except for L1? and ?L2?. More
precisely, the set of literals occurring in the
conclusion C is
C (C1 - L1)? ? (C2 - L2)?

122
Inverted Resolution

First Order Resolution
Example
C1 White(x) ? Swan(x), C2 Swan(Fred)
C1 White(x)??Swan(x),
? L1?Swan(x), L2Swan(Fred)
unifying substitution ? x/Fred
then L1? ?L2? ?Swan(Fred)
(C1-L1)? White(Fred)
(C2-L2)? Ø
? C White(Fred)

C (C1 - L1)? ? (C2 - L2)?
123
First Order Resolution

1. Find a literal L1 from clause C1 , literal L2
from clause C2, and substitution ? such that
L1?1
L2?2
2. Form the resolvent C by including all literals
from C1? and C2?, except for L1 theta and L2?.
More precisely, the set of literals occuring in
the conclusion is
C (C1 - L1)?1 ? (C2 -
L2 )?2
? C - (C1 - L1)?1 (C2 - L2 )?2
? C - (C1 - L1)?1 L2 ?2
(C2)?2
Inverting
C2 ( C - (C1 - L1) ?1 ) ?2-1 ?L1?1 ?2-1

124
First Order Resolution

Inverse Resolution First-order case
C(C1-L1)?1?(C2-L2)?2
(where, ? ?1?2 (factorization))
C - (C1-L1)?1 (C2-L2)?2
(where, L2 ?L1?1?2-1 )
? C2(C-(C1-L1)?1)?2-1??L1?1?2-1

125
Inverting Resolution (cont.)

Inverse Resolution First-order case
We whish to learn rules for GrandChild(y,x),
target predicate
Given training data D GrandChild(Bob,Shan
non)
Background info. B
Father(Shannon,Tom), Father(Tom,Bob)
CGrandChild(Bob,Shannon)
C1Father(Shannon,Tom)
L1Father(Shannon,Tom)
Suppose we choose inverse substitution
?1-1, ?2-1Shannon/x)
(C-(C1-L1)?1) ?2-1 (C)?2-1
GrandChild(Bob,x)
?L1?1?2-1 ?Father(x,Tom)
? C2 GrandChild(Bob,x) ??Father(x,Tom)
or equivalently GrandChild(Bob,x) ?
Father(x,Tom)

C2(C-(C1-L1)?1)?2-1??L1?1?2-1
126
Rule Learning Issues

Which is better rules or trees?
Trees share structure between disjuncts.
Rules allow completely independent features in
each disjunct.
Mapping some rules sets to decision trees results
in an exponential increase in size.

A ? B ? P C ? D ? P
What if add rule E ? F ? P ??
127
Rule Learning Issues

Which is better top-down or bottom-up search?
Bottom-up is more subject to noise, e.g. the
random seeds that are chosen may be noisy.

Write a Comment

User Comments (0)

About PowerShow.com

Machine Learning: Rule Learning PowerPoint PPT Presentation