Title: Machine Learning: Rule Learning
1Machine LearningRule Learning
- Intelligent Systems Lab.
- Soongsil University
Thanks to Raymond J. Mooney in the University of
Texas at Austin
2Introduction
- Set of If-then rules
- The hypothesis is easy to interpret.
- Goal
- Look at a new method to learn rules
- Rules
- Propositional rules (rules without variables)
- First-order predicate rules (with variables)
3Introduction 2
- GOAL Learning a target function as a set of
IF-THEN rules - BEFORE Learning with decision trees
- Learning the decision tree
- Translate the tree into a set of IF-THEN rules
(for each leaf one rule) - OTHER POSSIBILITY Learning with genetic
algorithms - Each set of rule is coded as a bitvector
- Several genetic operators are used on the
hypothesis space - TODAY AND HERE
- First Learning rules in propositional form
- Second Learning rules in first-order form (Horn
clauses which include variables) - Sequential search for rules, one after the other
4Rule induction
- To learn a set of IF-THEN rules for
classification - - Suitable when the target function can be
represented - by a set of IF-THEN rules
- Target function h Rule1,Rule2,...,Rulem
- Rulej IF(precondj1 ? precondj2 ?...?
precondjn) - THEN postcondj
- IF-THEN rules
- An expressive representation
- Most readable and understandable for
human
5Rule induction Example (1)
- Learning a set of propositional rules
- E.g., The target function (concept)
Buy_Computer is represented by - IF (AgeOld ? StudentNo) THEN
Buy_ComputerNo - IF (StudentYes) THEN Buy_ComputerYes
- IF (AgeMedium ? IncomeHigh) THEN
Buy_ComputerYes - Learning a set of first-order rules
- E.g., The target function (concept) Ancestor
is represented by - IF Parent(x,y) THEN Ancestor(x,y)
- IF Parent(x,y) ? Ancestor(y,z) THEN
Ancestor(x,z) - (Parent(x,y) is a predicate saying that y is
the father/mother of x)
6Rule induction Example (2)
- Rule IF(AgeOld ? StudentNo)THEN
Buy_ComputerNo - Which instances are correctly classified by the
above rule?
7Learning Rules
- If-then rules in logic are a standard
representation of knowledge that have proven
useful in expert-systems and other AI systems - In propositional logic a set of rules for a
concept is equivalent to DNF - Methods for automatically inducing rules from
data have been shown to build more accurate
expert systems than human knowledge engineering
for some applications. - Rule-learning methods have been extended to
first-order logic to handle relational
(structural) representations. - Inductive Logic Programming (ILP) for learning
Prolog programs from I/O pairs. - Allows moving beyond simple feature-vector
representations of data.
8Rule Learning Approaches
- Translate decision trees into rules (C4.5)
- Sequential (set) covering algorithms
- General-to-specific (top-down) (RIPPER, CN2,
FOIL) - Specific-to-general (bottom-up) (GOLEM, CIGOL)
- Hybrid search (AQ, Chillin, Progol)
- Translate neural-nets into rules (TREPAN)
9Decision-Trees to Rules
- For each path in a decision tree from the root to
a leaf, create a rule with the conjunction of
tests along the path as an antecedent and the
leaf label as the consequent.
red ? circle ? A blue ? B red ? square ? B green
? C red ? triangle ? C
color
green
red
blue
shape
B
C
circle
triangle
square
B
C
A
10Post-Processing Decision-Tree Rules
- Resulting rules may contain unnecessary
antecedents that are not needed to remove
negative examples and result in over-fitting. - Rules are post-pruned by greedily removing
antecedents or rules until performance on
training data or validation set is significantly
harmed. - Resulting rules may lead to competing conflicting
conclusions on some instances. - Sort rules by training (validation) accuracy to
create an ordered decision list. The first rule
in the list that applies is used to classify a
test instance.
red ? circle ? A (97 train accuracy) red ?
big ? B (95 train accuracy) Test case
ltbig, red, circlegt assigned to class A
11Propositional rule induction Training(Sequentia
l Covering Algorithms)
- To learn the set of rules in a sequential
(incremental) covering strategy - - Step 1. Learn one rule
- - Step 2. Remove from the training set those
instances correctly classified - by the rule
- ? Repeat the Steps 1, 2 to learn another
rule (using the remaining training - set)
-
- The learning procedure
- - Learns (i.e., covers) the rules
sequentially (incrementally) - - Can be repeated as many times as wanted to
learn a set of rules that cover a desired portion
of (or the full) the training set - The set of learned rules is sorted according to
some performance measure (e.g., classification
accuracy) - ?The rules will be referred in this order
when classifying a future - instance
12Propositional rule induction Classification
- Given a test instance
- The learned rules are tried (tested) sequentially
(i.e., rule by rule) in the order resulted in the
training phase - The first encountered rule that covers the
instance (i.e., the rules preconditions in the
IF clause match the instance) classifies it - The instance is classified by the post-condition
in the rules THEN clause - If no rule covers the instance, then the instance
is classified by the default rule - The instance is classified by the most frequent
value of the target attribute in the training
instances
13Sequential Covering Algorithms
- Require that each rule has high accuracy but low
coverage - High accuracy ? the correct prediction
- Accepting low coverage ? the prediction NOT
necessary for every training example
14Rule Coverage and Accuracy
- Coverage of a rule
- Fraction of records that satisfy the antecedent
of a rule - Accuracy of a rule
- Fraction of records that satisfy both the
antecedent and consequent of a rule
Rule IF (StatusSingle) Then No Coverage
40 (4/10), Accuracy 50 (2/4)
15Sequential Covering Algorithms (cont.)
16Sequential Covering
- A set of rules is learned one at a time, each
time finding a single rule that covers a large
number of positive instances without covering any
negatives, removing the positives that it covers,
and learning additional rules to cover the rest. -
- This is an instance of the greedy algorithm for
minimum set covering and does not guarantee a
minimum number of learned rules. - Minimum set covering is an NP-hard problem and
the greedy algorithm is a standard approximation
algorithm. - Methods for learning individual rules vary.
17?? Set Covering Problem
- Let S 1, 2, , n, and suppose Sj ? S for each
j. We say that index set J is a cover of S if ?j
? J ? Sj S - Set covering problem find a minimum cardinality
set cover of S. - Application to locating 119-fire stations.
- Locating hospitals.
- Locating Starbucksand many non-obvious
applications.
18Greedy Sequential Covering Example
Y
X
19Greedy Sequential Covering Example
Y
X
20Greedy Sequential Covering Example
Y
X
21Greedy Sequential Covering Example
Y
X
22Greedy Sequential Covering Example
Y
X
23Greedy Sequential Covering Example
Y
X
24Greedy Sequential Covering Example
Y
X
25No-optimal Covering Example
Y
X
26Greedy Sequential Covering Example
Y
X
27Greedy Sequential Covering Example
Y
X
28Greedy Sequential Covering Example
Y
X
29Greedy Sequential Covering Example
Y
X
30Greedy Sequential Covering Example
Y
X
31Greedy Sequential Covering Example
Y
X
32Greedy Sequential Covering Example
Y
X
33Greedy Sequential Covering Example
Y
X
34Strategies for Learning a Single Rule
- Top Down (General to Specific)
- Start with the most-general (empty) rule.
- Repeatedly add antecedent constraints on features
that eliminate negative examples while
maintaining as many positives as possible. - Stop when only positives are covered.
- Bottom Up (Specific to General)
- Start with a most-specific rule (e.g. complete
instance description of a random instance). - Repeatedly remove antecedent constraints in order
to cover more positives. - Stop when further generalization results in
covering negatives.
35Sequential Covering Algorithms (cont.)
- General to Specific Beam Search
- How do we learn each individual rule?
- Requirements for LEARN-ONE-RULE
- High accuracy, need not high coverage
- One approach is . . .
- To implement LEARN-ONE-RULE in similar way as in
decision tree learning (ID3), but to follow only
the most promising branch in the tree at each
step. - As illustrated in the figure, the search begins
by considering the most general rule precondition
possible (the empty test that matches every
instance), then greedily adding the attribute
test that most improves rule performance over
training examples.
36Sequential Covering Algorithms (cont.)
- Specialising search
- Organises a hypothesis space search in general
the same fashion as the ID3, but follows only
the most promising branch of the tree at each
step - 1. Begin with the most general rule (no/empty
precondition) - 2. Follow the most promising branch
- Greedily adding the attribute test that most
improves the measured performance of the rule
over the training example - 3. Greedy depth-first search with no backtracking
- Danger of sub-optimal choice
- Reduce the risk Beam Search (CN2-algorithm)Algor
ithm maintains the list of the k best candidates - In each search step, descendants are generated
for each of these k-best candidatesThe resulting
set is then reduced to the k most promising
members
37Sequential Covering Algorithms (cont.)
- General to Specific Beam Search
38Sequential Covering Algorithms (cont.)
- Variations
- Learn only rules that cover positive examples
- In the case that the fraction of positive example
is small - In this case, we can modify the algorithm to
learn only from those rare example, and classify
anything not covered by any rule as negative. - Instead of entropy, use a measure that evaluates
the fraction of positive examples covered by the
hypothesis
39Top-Down Rule Learning Example
Y
X
40Top-Down Rule Learning Example
Y
C1ltY
X
41Top-Down Rule Learning Example
Y
C1ltY
X
C2 ltX
42Top-Down Rule Learning Example
Y
YltC3
C1ltY
X
C2 ltX
43Top-Down Rule Learning Example
Y
YltC3
C1ltY
X
XltC4
C2 ltX
44Bottom-Up Rule Learning Example
Y
X
45Bottom-Up Rule Learning Example
Y
X
46Bottom-Up Rule Learning Example
Y
X
47Bottom-Up Rule Learning Example
Y
X
48Bottom-Up Rule Learning Example
Y
X
49Bottom-Up Rule Learning Example
Y
X
50Bottom-Up Rule Learning Example
Y
X
51Bottom-Up Rule Learning Example
Y
X
52Bottom-Up Rule Learning Example
Y
X
53Bottom-Up Rule Learning Example
Y
X
54Bottom-Up Rule Learning Example
Y
X
55Sequential Covering Algorithms (cont.)
- General to Specific Beam Search
- Greedy search without backtracking
- ? danger of suboptimal choice at any step
- The algorithm can be extended using beam-search
- Keep a list of the k best candidates at each step
- On each search step, descendants are generated
for each of these k best candidates and the
resulting set is again reduced to the k best
candidates.
56Learning Rule Sets Summary
- Key design issue for learning sets of rules
- Sequential or Simultaneous?
- Sequential learning one rule at a time,
removing the covered examples and repeating the
process on the remaining examples - Simultaneous learning the entire set of
disjuncts simultaneously as part of the single
search for an acceptable decision tree as in ID3 - General-to-specific or Specific-to-general?
- G?S Learn-One-Rule
- S?G Find-S
- Generate-and-test or Example-driven?
- GT search thru syntactically legal hypotheses
- E-D Find-S, Candidate-Elimination
- Post-pruning of Rules?
- Similar method to the one discussed in decision
tree learning
57Measure performance of a rule (1)
- Relative frequency
- D_trainR The set of the training instances that
match the preconditions of rule R - n of examples matched by rule, i.e, size of
D_trainR - nc of examples classified by rule correctly
58Measure performance of a rule (2)
- m-estimate of accuracy
- p The prior probability that an instance,
randomly drawn from the entire
dataset, will have the classification assigned by
rule R - ?p is the prior assumed accuracy (?, 100??
example ? 12?? example? ????? p0.12) - m A weight that indicates how much the prior
probability p - influences the rule performance measure
- - If m0, then m-estimate becomes the
relative frequency measure - - As m increases, a larger number of
instances is needed to override - the prior assumed accuracy p
59Measure performance of a rule (3)
- Entropy measure
-
-
- c The number of possible values (i.e., classes)
of the target - attribute
-
- pi The proportion of instances in D_trainR for
which the target attribute takes on the i-th
value (i.e., class)
60Sequential covering algorithms Issues
- Reduce the (more difficult) problem of learning a
disjunctive set of rules to a sequence of
(simpler) problems, each is to learn a single
conjunctive rule - After a rule is learned (i.e., found) the
training instances covered(classified) by the
rule are removed from the training set - Each rule may not be treated as independent of
other rules - Perform a greedy search (for finding a sequence
of rules) without backtracking - Not guaranteed to find the smallest set of rules
- Not guaranteed to find the best set of rules
61Learning First-Order Rules
- From now . . .
- We consider learning rule that contain variables
(first-order rules) - Inductive learning of first-order rules
inductive logic programming (ILP) - Can be viewed as automatically inferring Prolog
programs - Two methods are considered
- FOIL
- Induction as inverted deduction
62Learning First-Order Rules (cont.)
- First-order rule
- Rules that contain variables
- Example
- Ancestor (x, y) ? Parent (x, y).
- Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
y) recursive - More expressive than propositional rules
- IF (Father1 Bob) (Name2 Bob) (Female1
True), - THEN Daughter1,2 True
- IF Father(y,x) Female(y), THEN Daughter(x,y)
63Learning First-Order Rules (cont.)
- Formal definitions in first-order logic
- Constants e.g., John, Kansas, 42
- Variables e.g., Name, State, x
- Predicates e.g., Male, as in Male(John)
- Functions e.g., age,cosine as in,age(Gunho),
cosine(x) - Term constant, variable, or function(term)
- Literal is a predicate (or its negation) applied
to a set of terms, e.g., Greater_Than(age(John),20
), Male(x), etc. - A first-order rule is a Horn clause
- H and Li(i1..n) are literals
- Clause disjunction of literals with implicit
universal quantification - Horn clause at most one positive literal
- (H ? ?L1 ? ?L2 ? ? ?Ln)
64Learning First-Order Rules (cont.)
- First Order Horn Clauses
- Rules that have one or more preconditions and one
single consequent. Predicates may have variables - The following Horn clause is equivalent
- H ? ?L1 ? ? ? Ln
- H ? (L1 ? ? Ln )
- IF (L1 ? Ln), THEN H
65Learning Sets of First-Order Rules FOIL
- First-Order Inductive Learning (FOIL)
- Natural extension of Sequential covering
Learn-one-rule - FOIL rule similar to Horn clause with two
exceptions - Syntactic restriction no function
- More expressive than Horn clauses
- Negation allowed in rule bodies
66Learning a Single Rule in FOIL
- Top-down approach originally applied to
first-order logic (Quinlan, 1990). - Basic algorithm for instances with
discrete-valued features - Let A (set of rule antecedents)
- Let N be the set of negative examples
- Let P the current set of uncovered positive
examples - Until N is empty do
- For every feature-value pair (literal)
(FiVij) calculate - Gain(FiVij, P, N)
- Pick literal, L, with highest gain.
- Add L to A.
- Remove from N any examples that do not
satisfy L. - Remove from P any examples that do not
satisfy L. - Return the rule A1 ?A2 ? ?An ? Positive
67Learning first-order rules FOIL alg.
68Sequential (set) covering algorithms
- CN2 Algorithm
- Start from an empty conjunct
- Add conjuncts that minimizes the entropy measure
A, A,B, - Determine the rule consequent by taking majority
class of instances covered by the rule - RIPPER Algorithm
- Start from an empty rule R0 gt class(initial
rule) - Add conjuncts that maximizes FOILs information
gain measure
- R1 A gt class (rule after adding conjunct)
- t number of positive instances covered by both
R0 and R1 - p0 number of positive instances covered by R0
- n0 number of negative instances covered by R0
- p1 number of positive instances covered by R1
- n1 number of negative instances covered by R1
69Foil Gain Metric
- Want to achieve two goals
- Decrease coverage of negative examples
- Measure increase in percentage of positives
covered when literal is added to the rule. - Maintain coverage of as many positives as
possible. - Count number of positives covered.
70Foil_Gain measure
- R0 gt class (initial rule)
- R1 A gt class (rule after adding conjunct)
- t number of positive instances covered by both
R0 and R1 - p0 number of positive instances covered by R0
- n0 number of negative instances covered by R0
- p1 number of positive instances covered by R1
- n1 number of negative instances covered by R1
71Example Foil_Gain measure
- R0 ? (initial rule)
- R1 A ? class (rule after adding conjunct)
- R2 B ? class (rule after adding conjunct)
- R3 C ? class (rule after adding conjunct)
- Assume the initial rule is gt.
- This rule covers p0 100 positive examples and
n0 400 negative examples. - Choose one rule !!
- R1 covers 4 positive examples and 1 negative
example. - R2 covers 30 positive examples and 10 negative
example. - R3 covers 100 positive examples and 90 negative
examples.
72Example Foil_Gain measure
- R0 gt . (initial rule)
- This rule covers 100 positive examples and
- 400 negative examples.
- R1 A gt class (rule after adding conjunct)
- R1 covers 4 positive examples and 1 negative
example. - t 4, number of positive instances covered by
both R0 and R1 - p0 100
- n0 400
- p1 4
- n1 1
73Example Foil_Gain measure
- R0 ? . (initial rule)
- This rule covers 100 positive examples
and - 400 negative examples.
- R2 B ? (rule after adding conjunct)
- R2 covers 30 positive examples and 10 negative
example.. - t 30, number of positive instances covered by
both R0 and R1 - p0 100
- n0 400
- p1 30
- n1 10
74Example Foil_Gain measure
- R0 gt . (initial rule)
- This rule covers 100 positive examples
and - 400 negative examples.
- R3 C gt (rule after adding conjunct)
- R3 covers 100 positive examples and 90 negative
example.. - t 100, number of positive instances covered by
both R0 and R1 - p0 100
- n0 400
- p1 100
- n1 90
75Example
- Training data assertions
- GrandDaughter(Victor, Sharon)
- Female(Sharon) Father(Sharon,Bob)
- Father(Tom, Bob) Father(Bob, Victor)
- Use closed world assumption any literal
involving the specified predicates and literals
that is not listed is assumed to be false - ?GrandDaughter(Tom,Bob) ?GrandDaughter(Tom,Tom)
- ?GrandDaughter(Bob,Victor)
- ?Female(Tom) etc.
76Possible Variable Bindings
- Initial rule
- GrandDaughter(x,y)
- Possible bindings from training assertions (how
many possible bindings of 4 literals to initial
rule?) - Positive binding x/Victor, y/Sharon
- Negative bindings x/Victor, y/Victor
- x/Tom, y/Sharon, etc.
- Positive bindings provide positive evidence and
negative bindings provide negative evidence
against the rule under consideration.
77Rule Pruning in FOIL
- Prepruning method based on minimum description
length (MDL) principle. - Postpruning to eliminate unnecessary complexity
due to limitations of greedy algorithm. - For each rule, R
- For each antecedent, A, of rule
- If deleting A from R does not
cause - negatives to become covered
- then delete A
- For each rule, R
- If deleting R does not uncover any
positives (since they - are redundantly covered by other
rules) - then delete R
78Sequential (set) covering algorithms
- CN2 Algorithm
- Start from an empty conjunct
- Add conjuncts that minimizes the entropy measure
A, A,B, - Determine the rule consequent by taking majority
class of instances covered by the rule - RIPPER Algorithm
- Start from an empty rule R0 gt class(initial
rule) - Add conjuncts that maximizes FOILs information
gain measure
- R1 A gt class (rule after adding conjunct)
- t number of positive instances covered by both
R0 and R1 - p0 number of positive instances covered by R0
- n0 number of negative instances covered by R0
- p1 number of positive instances covered by R1
- n1 number of negative instances covered by R1
79General to Specific Beam Search
- Learning with decision tree
80General to Specific Beam Search 4
The CN2-Algorithm
LearnOneRule( target_attribute, attributes,
examples, k ) Initialise best_hypothesis Ø ,
the most general hypothesis Initialise
candidate_hypotheses ? best_hypothesis while
( candidate_hypothesis is not empty ) do 1.
Generate the next more-specific
candidate_hypothesis 2. Update
best_hypothesis 3. Update candidate_hypothesis
return a rule of the form IF best_hypothesis
THEN prediction where prediction is the most
frequent value of target_attribute among those
examples that match best_hypothesis. Performance
( h, examples, target_attribute ) h_examples
the subset of examples that match h return
-Entropy( h_examples ), where Entropy is with
respect to target_attribute
81General to Specific Beam Search 5
- Generate the next more specific
candidate_hypothesis
all_constraints set of all constraints (a
v), where a Î attributes and v is a value
of a occuring in the current set of
examples new_candidate_hypothesis for each h
in candidate_hypotheses, for each c
in all_constraints Create
a specialisation of h by adding the constraint c
Remove from new_candidate_hypothesis
any hypotheses which are
duplicate, inconsistent or not maximally specific
for all h in new_candidate_hypothesis do if
statistically significant when tested on
examples Performance( h, examples,
target_attribute ) gt
Performance( best_hypothesis, examples,
target_attribute ) )
then best_hypothesis h
82General to Specific Beam Search 6
- Update the candidate-hypothesis
candidate_hypothesis the k best members of
new_candidate_hypothesis, according to
Performance function
- Performance function guides the search in the
Learn-One -Rule - s the current set of training examples
- c the number of possible values of the target
attribute - part of the examples, which are classified
with the ith. value
83Example for CN2-Algorithm
LearnOneRule(EnjoySport, Sky, AirTemp, Humidity,
Wind, Water, Forecast, EnjoySport, examples, 2)
best_hypothesis Ø candidate_hypotheses
Ø all_constraints SkySunny, SkyRainy,
AirTempWarm, AirTempCold,
HumidityNormal, HumidityHigh,
WindStrong,
WaterWarm, WaterCool,
ForecastSame, ForecastChange Performance
nc / n n Number of examples, covered by the
rule nc Number of examples covered by the rule
and classification is correct
84Example for CN2-Algorithm (2)
Pass 1 Remove delivers no result
candidate_hypotheses SkySunny,
AirTempWarm best_hypothesis is SkySunny
85Example for CN2-Algorithm (3)
Pass 2 Remove (duplicate, inconsistent, not
maximally specific)
candidate_hypotheses SkySunny AND
AirTempWarm, SkySunny AND HumidityHigh best_h
ypothesis remains SkySunny
86Relational Learning andInductive Logic
Programming (ILP)
- Fixed feature vectors are a very limited
representation of instances. - Examples or target concept may require relational
representation that includes multiple entities
with relationships between them (e.g. a graph
with labeled edges and nodes). - First-order predicate logic is a more powerful
representation for handling such relational
descriptions. - Horn clauses (i.e. if-then rules in predicate
logic, Prolog programs) are a useful restriction
on full first-order logic that allows decidable
inference. - Allows learning programs from sample I/O pairs.
87ILP Examples
- Learn definitions of family relationships given
data for primitive types and relations. - parent(A,B) - brother(A,C), parent(C,B).
- uncle(A,B) - husband(A,C), sister(C,D),
parent(D,B). - Learn recursive list programs from I/O pairs.
- member(X, X Y).
- member(X, Y Z) - member(X,Z).
- append( ,L,L).
- append(XL1,L2,XL12)- append(L1,L2,L12).
88ILP
- Goal is to induce a Horn-clause definition for
some target predicate P, given definitions of a
set of background predicates Q. - Goal is to find a syntactically simple
Horn-clause definition, D, for P given background
knowledge B defining the background predicates Q.
- For every positive example pi of P
- For every negative example ni of P
- Background definitions are provided either
- Extensionally List of ground tuples satisfying
the predicate. - Intensionally Prolog definitions of the
predicate.
89ILP Systems
- Top-Down
- FOIL (Quinlan, 1990)
- Bottom-Up
- CIGOL (Muggleton Buntine, 1988)
- GOLEM (Muggleton, 1990)
- Hybrid
- CHILLIN (Mooney Zelle, 1994)
- PROGOL (Muggleton, 1995)
- ALEPH (Srinivasan, 2000)
90FOILFirst-Order Inductive Logic
- Top-down sequential covering algorithm upgraded
to learn Prolog clauses, but without logical
functions. - Background knowledge must be provided
extensionally. - Initialize clause for target predicate P to
- P(X1,.XT) -.
- Possible specializations of a clause include
adding all possible literals - Qi(V1,,VTi)
- not(Qi(V1,,VTi))
- Xi Xj
- not(Xi Xj)
- where Xs are bound variables already in
the existing clause at least one of V1,,VTi is
a bound variable, others can be new. - Allow recursive literals P(V1,,VT) if they do
not cause an infinite regress. - Handle alternative possible values of new
intermediate variables by maintaining examples as
tuples of all variable values.
91FOIL Training Data
- For learning a recursive definition, the positive
set must consist of all tuples of constants that
satisfy the target predicate, given some fixed
universe of constants. - Background knowledge consists of complete set of
tuples for each background predicate for this
universe. - Example Consider learning a definition for the
target predicate path for finding a path in a
directed acyclic graph. - path(X,Y) - edge(X,Y).
- path(X,Y) - edge(X,Z), path(Z,Y).
Horn-clause definition, D,
some target predicate P
Background Knowledge, B edge lt1,2gt,lt1,3gt,lt3,6gt,
lt4,2gt,lt4,6gt,lt6,5gt path lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,
lt3,6gt,lt3,5gt, lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
92FOIL Negative Training Data
- Negative examples of target predicate can be
provided directly, or generated indirectly by
making a closed world assumption. - Every pair of constants ltX,Ygt not in positive
tuples for path predicate.
Negative path tuples path(X, Y)? ltX,
Ygt lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2,6
gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4gt,lt5,1
gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,lt6,2gt,lt6,3
gt, lt6,4gt,lt6,6gt
93Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Start with clause path(X,Y)-. Possible
literals to add edge(X,X),edge(Y,Y),edge(X,Y),edg
e(Y,X),edge(X,Z), edge(Y,Z),edge(Z,X),edge(Z,Y),p
ath(X,X),path(Y,Y), path(X,Y),path(Y,X),path(X,Z),
path(Y,Z),path(Z,X), path(Z,Y),XY, plus
negations of all of these.
94Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,X).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 6 negative examples
Not a good literal.
95Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
96Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
Remove covered positive tuples !!.
97Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
path(X,Y)- edge(X,X). Start new
clause path(X,Y)-.
98Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 0 negative examples
Not a good literal.
99Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 14 of 26 negative examples
Eventually chosen as best possible literal
100Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples !!
101Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
102Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
103Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
104Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
105Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values.
106Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Continue specializing clause path(X,Y)-
edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
107Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z),edge(Z,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 3 positive examples
Covers 0 negative examples
108Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z), path(Z,Y).
Covers 4 positive examples
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 negative examples
Eventually chosen as best literal completes
clause.
Definition complete, since all original ltX,Ygt
tuples are covered (by way of covering some
ltX,Y,Zgt tuple.)
109Logic Program Induction in FOIL
- FOIL has also learned
- append given components and null
- reverse given append, components, and null
- quicksort given partition, append, components,
and null - Other programs from the first few chapters of a
Prolog text. - Learning recursive programs in FOIL requires a
complete set of positive examples for some
constrained universe of constants, so that a
recursive call can always be evaluated
extensionally. - For lists, all lists of a limited length composed
from a small set of constants (e.g. all lists up
to length 3 using a,b,c). - Size of extensional background grows
combinatorially. - Negative examples usually computed using a
closed-world assumption. - Grows combinatorially large for higher arity
target predicates. - Can randomly sample negatives to make tractable.
110More Realistic Applications
- Classifying chemical compounds as mutagenic
(cancer causing) based on their graphical
molecular structure and chemical background
knowledge. - Classifying web documents based on both the
content of the page and its links to and from
other pages with particular content. - A web page is a university faculty home page if
- It contains the words Professor and
University, and - It is pointed to by a page with the word
faculty, and - It points to a page with the words course and
exam
111FOIL Limitations
- Search space of literals (branching factor) can
become intractable. - Use aspects of bottom-up search to limit search.
- Requires large extensional background
definitions. - Use intensional background via Prolog inference.
- Hill-climbing search gets stuck at local optima
and may not even find a consistent clause. - Use limited backtracking (beam search)
- Include determinate literals with zero gain.
- Use relational pathfinding or relational clichés.
- Requires complete examples to learn recursive
definitions. - Use intensional interpretation of learned
recursive clauses.
112Rule Learning and ILP Summary
- There are effective methods for learning symbolic
rules from data using greedy sequential covering
and top-down or bottom-up search. - These methods have been extended to first-order
logic to learn relational rules and recursive
Prolog programs. - Knowledge represented by rules is generally more
interpretable by people, allowing human insight
into what is learned and possible human approval
and correction of learned knowledge.
113Induction as Inverted Deduction
- Induction is finding h such that
-
- (?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
- where
- xi is the ith training instance
- f(xi) is the target function value for xi
- B is other background knowledge
- So lets design inductive algorithms by inverting
- operators for automated deduction
114Induction as Inverted Deduction
- pairs of people, ltu,vgt such that child of u is
v, - f(xi) Child(Bob,Sharon)
- xi Male(Bob),Female(Sharon),Father(Sharon,Bob)
- B Parent(u,v) ? Father(u,v)
- What satisfies (?lt xi,f(xi)gt ? D) B ? h ? xi
f(xi) ? -
- h1 Child(u,v) ? Father(v,u) ?
h1 ? xi with no need B - h2 Child(u,v) ? Parent(v,u) ? B ?
h1 ? xi
D
115Induction as Inverted Deduction
We have mechanical deductive operators F(A,B)
C, where A ? B C need inductive
operators O(B,D) h where (?ltxi,f(xi)gt
? D) B ? h ? xi f(xi)
116Induction as Inverted Deduction
- Positives
- Subsumes earlier idea of finding h that fits
training data - Domain theory B helps define meaning of fit the
data - B ? h ? xi f(xi)
- Negatives
- Doesnt allow for noisy data. Consider
- (?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
- First order logic gives a huge hypothesis space H
- overfitting
- intractability of calculating all acceptable
hs
117Deduction Resolution Rule
- C1 P ? L
- C2 L ? R
- P ? R
- 1. Given initial clauses C1 and C2, find a
literal L from clause C1 such that L occurs in
clause C2. - 2. Form the resolvent C by including all literals
from C1 and C2, except for L and L. - More precisely, the set of literals occurring
in the conclusion C is - C (C1 - L) ? (C2 - L)
- where ? denotes set union, and - set
difference.
118Inverting Resolution
-
- C (C1 - L) ? (C2 - L)
- C1 PassExam ? KnowMaterial C2
KnowMaterial ? Study - C
PassExam ? Study
119Inverted Resolution (Propositional)
- Given initial clauses C1 and C, find a literal L
that occurs in clause C1, but not in clause C. - Form the second clause C2 by including the
- following literals
- C2 (C - (C1 - L)) ? L
- C1 PassExam ? KnowMaterial C2
KnowMaterial ? Study
C PassExam ? Study
120Inverted Resolution
- First Order Resolution
- substitution
- to be any mapping of variables to terms.
- ex) ? x / Bob, y / z
- Unifying Substitution
- For two literal L1 and L2, provided L1? L2?
- Ex) ? x/Bill, z/y
- L1Father(x, y), L2Father(Bill, z)
- L1? L2? Father(Bill, y)
121Inverted Resolution
- First Order Resolution
- Resolution Operator (first-order form)
- Find a literal L1 from clause C1, literal L2 from
clause C2, and substitution ? such that L1?
?L2?. - From the resolvent C by including all literals
from C1? and C2?, except for L1? and ?L2?. More
precisely, the set of literals occurring in the
conclusion C is - C (C1 - L1)? ? (C2 - L2)?
122Inverted Resolution
- First Order Resolution
- Example
- C1 White(x) ? Swan(x), C2 Swan(Fred)
- C1 White(x)??Swan(x),
- ? L1?Swan(x), L2Swan(Fred)
- unifying substitution ? x/Fred
- then L1? ?L2? ?Swan(Fred)
- (C1-L1)? White(Fred)
- (C2-L2)? Ø
- ? C White(Fred)
C (C1 - L1)? ? (C2 - L2)?
123First Order Resolution
- 1. Find a literal L1 from clause C1 , literal L2
from clause C2, and substitution ? such that -
- L1?1
L2?2 - 2. Form the resolvent C by including all literals
from C1? and C2?, except for L1 theta and L2?.
More precisely, the set of literals occuring in
the conclusion is -
- C (C1 - L1)?1 ? (C2 -
L2 )?2 - ? C - (C1 - L1)?1 (C2 - L2 )?2
- ? C - (C1 - L1)?1 L2 ?2
(C2)?2 - Inverting
- C2 ( C - (C1 - L1) ?1 ) ?2-1 ?L1?1 ?2-1
124First Order Resolution
- Inverse Resolution First-order case
- C(C1-L1)?1?(C2-L2)?2
- (where, ? ?1?2 (factorization))
- C - (C1-L1)?1 (C2-L2)?2
- (where, L2 ?L1?1?2-1 )
- ? C2(C-(C1-L1)?1)?2-1??L1?1?2-1
125Inverting Resolution (cont.)
- Inverse Resolution First-order case
- We whish to learn rules for GrandChild(y,x),
target predicate - Given training data D GrandChild(Bob,Shan
non) - Background info. B
Father(Shannon,Tom), Father(Tom,Bob) -
CGrandChild(Bob,Shannon) -
C1Father(Shannon,Tom) -
L1Father(Shannon,Tom) -
- Suppose we choose inverse substitution
- ?1-1, ?2-1Shannon/x)
- (C-(C1-L1)?1) ?2-1 (C)?2-1
GrandChild(Bob,x) - ?L1?1?2-1 ?Father(x,Tom)
- ? C2 GrandChild(Bob,x) ??Father(x,Tom)
- or equivalently GrandChild(Bob,x) ?
Father(x,Tom)
C2(C-(C1-L1)?1)?2-1??L1?1?2-1
126Rule Learning Issues
- Which is better rules or trees?
- Trees share structure between disjuncts.
- Rules allow completely independent features in
each disjunct. - Mapping some rules sets to decision trees results
in an exponential increase in size.
A ? B ? P C ? D ? P
What if add rule E ? F ? P ??
127Rule Learning Issues
- Which is better top-down or bottom-up search?
- Bottom-up is more subject to noise, e.g. the
random seeds that are chosen may be noisy.