Machine Learning: Rule Learning - PowerPoint PPT Presentation

1 / 129
About This Presentation
Title:

Machine Learning: Rule Learning

Description:

Translate the tree into a set of IF-THEN rules (for each leaf one rule) ... THEN Play-Tennis = Yes. Sequential Covering Algorithms (cont.) General to Specific ... – PowerPoint PPT presentation

Number of Views:414
Avg rating:3.0/5.0
Slides: 130
Provided by: softwar4
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning: Rule Learning


1
Machine LearningRule Learning
  • Intelligent Systems Lab.
  • Soongsil University

Thanks to Raymond J. Mooney in the University of
Texas at Austin
2
Introduction
  • Set of If-then rules
  • The hypothesis is easy to interpret.
  • Goal
  • Look at a new method to learn rules
  • Rules
  • Propositional rules (rules without variables)
  • First-order predicate rules (with variables)

3
Introduction 2
  • GOAL Learning a target function as a set of
    IF-THEN rules
  • BEFORE Learning with decision trees
  • Learning the decision tree
  • Translate the tree into a set of IF-THEN rules
    (for each leaf one rule)
  • OTHER POSSIBILITY Learning with genetic
    algorithms
  • Each set of rule is coded as a bitvector
  • Several genetic operators are used on the
    hypothesis space
  • TODAY AND HERE
  • First Learning rules in propositional form
  • Second Learning rules in first-order form (Horn
    clauses which include variables)
  • Sequential search for rules, one after the other

4
Rule induction
  • To learn a set of IF-THEN rules for
    classification
  • - Suitable when the target function can be
    represented
  • by a set of IF-THEN rules
  • Target function h Rule1,Rule2,...,Rulem
  • Rulej IF(precondj1 ? precondj2 ?...?
    precondjn)
  • THEN postcondj
  • IF-THEN rules
  • An expressive representation
  • Most readable and understandable for
    human

5
Rule induction Example (1)
  • Learning a set of propositional rules
  • E.g., The target function (concept)
    Buy_Computer is represented by
  • IF (AgeOld ? StudentNo) THEN
    Buy_ComputerNo
  • IF (StudentYes) THEN Buy_ComputerYes
  • IF (AgeMedium ? IncomeHigh) THEN
    Buy_ComputerYes
  • Learning a set of first-order rules
  • E.g., The target function (concept) Ancestor
    is represented by
  • IF Parent(x,y) THEN Ancestor(x,y)
  • IF Parent(x,y) ? Ancestor(y,z) THEN
    Ancestor(x,z)
  • (Parent(x,y) is a predicate saying that y is
    the father/mother of x)

6
Rule induction Example (2)
  • Rule IF(AgeOld ? StudentNo)THEN
    Buy_ComputerNo
  • Which instances are correctly classified by the
    above rule?

7
Learning Rules
  • If-then rules in logic are a standard
    representation of knowledge that have proven
    useful in expert-systems and other AI systems
  • In propositional logic a set of rules for a
    concept is equivalent to DNF
  • Methods for automatically inducing rules from
    data have been shown to build more accurate
    expert systems than human knowledge engineering
    for some applications.
  • Rule-learning methods have been extended to
    first-order logic to handle relational
    (structural) representations.
  • Inductive Logic Programming (ILP) for learning
    Prolog programs from I/O pairs.
  • Allows moving beyond simple feature-vector
    representations of data.

8
Rule Learning Approaches
  • Translate decision trees into rules (C4.5)
  • Sequential (set) covering algorithms
  • General-to-specific (top-down) (RIPPER, CN2,
    FOIL)
  • Specific-to-general (bottom-up) (GOLEM, CIGOL)
  • Hybrid search (AQ, Chillin, Progol)
  • Translate neural-nets into rules (TREPAN)

9
Decision-Trees to Rules
  • For each path in a decision tree from the root to
    a leaf, create a rule with the conjunction of
    tests along the path as an antecedent and the
    leaf label as the consequent.

red ? circle ? A blue ? B red ? square ? B green
? C red ? triangle ? C
color
green
red
blue
shape
B
C
circle
triangle
square
B
C
A
10
Post-Processing Decision-Tree Rules
  • Resulting rules may contain unnecessary
    antecedents that are not needed to remove
    negative examples and result in over-fitting.
  • Rules are post-pruned by greedily removing
    antecedents or rules until performance on
    training data or validation set is significantly
    harmed.
  • Resulting rules may lead to competing conflicting
    conclusions on some instances.
  • Sort rules by training (validation) accuracy to
    create an ordered decision list. The first rule
    in the list that applies is used to classify a
    test instance.

red ? circle ? A (97 train accuracy) red ?
big ? B (95 train accuracy) Test case
ltbig, red, circlegt assigned to class A
11
Propositional rule induction Training(Sequentia
l Covering Algorithms)
  • To learn the set of rules in a sequential
    (incremental) covering strategy
  • - Step 1. Learn one rule
  • - Step 2. Remove from the training set those
    instances correctly classified
  • by the rule
  • ? Repeat the Steps 1, 2 to learn another
    rule (using the remaining training
  • set)
  • The learning procedure
  • - Learns (i.e., covers) the rules
    sequentially (incrementally)
  • - Can be repeated as many times as wanted to
    learn a set of rules that cover a desired portion
    of (or the full) the training set
  • The set of learned rules is sorted according to
    some performance measure (e.g., classification
    accuracy)
  • ?The rules will be referred in this order
    when classifying a future
  • instance

12
Propositional rule induction Classification
  • Given a test instance
  • The learned rules are tried (tested) sequentially
    (i.e., rule by rule) in the order resulted in the
    training phase
  • The first encountered rule that covers the
    instance (i.e., the rules preconditions in the
    IF clause match the instance) classifies it
  • The instance is classified by the post-condition
    in the rules THEN clause
  • If no rule covers the instance, then the instance
    is classified by the default rule
  • The instance is classified by the most frequent
    value of the target attribute in the training
    instances

13
Sequential Covering Algorithms
  • Require that each rule has high accuracy but low
    coverage
  • High accuracy ? the correct prediction
  • Accepting low coverage ? the prediction NOT
    necessary for every training example

14
Rule Coverage and Accuracy
  • Coverage of a rule
  • Fraction of records that satisfy the antecedent
    of a rule
  • Accuracy of a rule
  • Fraction of records that satisfy both the
    antecedent and consequent of a rule

Rule IF (StatusSingle) Then No Coverage
40 (4/10), Accuracy 50 (2/4)
15
Sequential Covering Algorithms (cont.)
16
Sequential Covering
  • A set of rules is learned one at a time, each
    time finding a single rule that covers a large
    number of positive instances without covering any
    negatives, removing the positives that it covers,
    and learning additional rules to cover the rest.
  • This is an instance of the greedy algorithm for
    minimum set covering and does not guarantee a
    minimum number of learned rules.
  • Minimum set covering is an NP-hard problem and
    the greedy algorithm is a standard approximation
    algorithm.
  • Methods for learning individual rules vary.

17
?? Set Covering Problem
  • Let S 1, 2, , n, and suppose Sj ? S for each
    j. We say that index set J is a cover of S if ?j
    ? J ? Sj S
  • Set covering problem find a minimum cardinality
    set cover of S.
  • Application to locating 119-fire stations.
  • Locating hospitals.
  • Locating Starbucksand many non-obvious
    applications.

18
Greedy Sequential Covering Example
Y













X
19
Greedy Sequential Covering Example
Y













X
20
Greedy Sequential Covering Example
Y






X
21
Greedy Sequential Covering Example
Y






X
22
Greedy Sequential Covering Example
Y



X
23
Greedy Sequential Covering Example
Y



X
24
Greedy Sequential Covering Example
Y
X
25
No-optimal Covering Example
Y













X
26
Greedy Sequential Covering Example
Y













X
27
Greedy Sequential Covering Example
Y






X
28
Greedy Sequential Covering Example
Y






X
29
Greedy Sequential Covering Example
Y


X
30
Greedy Sequential Covering Example
Y


X
31
Greedy Sequential Covering Example
Y

X
32
Greedy Sequential Covering Example
Y

X
33
Greedy Sequential Covering Example
Y
X
34
Strategies for Learning a Single Rule
  • Top Down (General to Specific)
  • Start with the most-general (empty) rule.
  • Repeatedly add antecedent constraints on features
    that eliminate negative examples while
    maintaining as many positives as possible.
  • Stop when only positives are covered.
  • Bottom Up (Specific to General)
  • Start with a most-specific rule (e.g. complete
    instance description of a random instance).
  • Repeatedly remove antecedent constraints in order
    to cover more positives.
  • Stop when further generalization results in
    covering negatives.

35
Sequential Covering Algorithms (cont.)
  • General to Specific Beam Search
  • How do we learn each individual rule?
  • Requirements for LEARN-ONE-RULE
  • High accuracy, need not high coverage
  • One approach is . . .
  • To implement LEARN-ONE-RULE in similar way as in
    decision tree learning (ID3), but to follow only
    the most promising branch in the tree at each
    step.
  • As illustrated in the figure, the search begins
    by considering the most general rule precondition
    possible (the empty test that matches every
    instance), then greedily adding the attribute
    test that most improves rule performance over
    training examples.

36
Sequential Covering Algorithms (cont.)
  • Specialising search
  • Organises a hypothesis space search in general
    the same fashion as the ID3, but follows only
    the most promising branch of the tree at each
    step
  • 1. Begin with the most general rule (no/empty
    precondition)
  • 2. Follow the most promising branch
  • Greedily adding the attribute test that most
    improves the measured performance of the rule
    over the training example
  • 3. Greedy depth-first search with no backtracking
  • Danger of sub-optimal choice
  • Reduce the risk Beam Search (CN2-algorithm)Algor
    ithm maintains the list of the k best candidates
  • In each search step, descendants are generated
    for each of these k-best candidatesThe resulting
    set is then reduced to the k most promising
    members

37
Sequential Covering Algorithms (cont.)
  • General to Specific Beam Search

38
Sequential Covering Algorithms (cont.)
  • Variations
  • Learn only rules that cover positive examples
  • In the case that the fraction of positive example
    is small
  • In this case, we can modify the algorithm to
    learn only from those rare example, and classify
    anything not covered by any rule as negative.
  • Instead of entropy, use a measure that evaluates
    the fraction of positive examples covered by the
    hypothesis

39
Top-Down Rule Learning Example
Y













X
40
Top-Down Rule Learning Example
Y







C1ltY






X
41
Top-Down Rule Learning Example
Y







C1ltY






X
C2 ltX
42
Top-Down Rule Learning Example
Y
YltC3







C1ltY






X
C2 ltX
43
Top-Down Rule Learning Example
Y
YltC3







C1ltY






X
XltC4
C2 ltX
44
Bottom-Up Rule Learning Example
Y













X
45
Bottom-Up Rule Learning Example
Y













X
46
Bottom-Up Rule Learning Example
Y













X
47
Bottom-Up Rule Learning Example
Y













X
48
Bottom-Up Rule Learning Example
Y













X
49
Bottom-Up Rule Learning Example
Y













X
50
Bottom-Up Rule Learning Example
Y













X
51
Bottom-Up Rule Learning Example
Y













X
52
Bottom-Up Rule Learning Example
Y













X
53
Bottom-Up Rule Learning Example
Y













X
54
Bottom-Up Rule Learning Example
Y













X
55
Sequential Covering Algorithms (cont.)
  • General to Specific Beam Search
  • Greedy search without backtracking
  • ? danger of suboptimal choice at any step
  • The algorithm can be extended using beam-search
  • Keep a list of the k best candidates at each step
  • On each search step, descendants are generated
    for each of these k best candidates and the
    resulting set is again reduced to the k best
    candidates.

56
Learning Rule Sets Summary
  • Key design issue for learning sets of rules
  • Sequential or Simultaneous?
  • Sequential learning one rule at a time,
    removing the covered examples and repeating the
    process on the remaining examples
  • Simultaneous learning the entire set of
    disjuncts simultaneously as part of the single
    search for an acceptable decision tree as in ID3
  • General-to-specific or Specific-to-general?
  • G?S Learn-One-Rule
  • S?G Find-S
  • Generate-and-test or Example-driven?
  • GT search thru syntactically legal hypotheses
  • E-D Find-S, Candidate-Elimination
  • Post-pruning of Rules?
  • Similar method to the one discussed in decision
    tree learning

57
Measure performance of a rule (1)
  • Relative frequency
  • D_trainR The set of the training instances that
    match the preconditions of rule R
  • n of examples matched by rule, i.e, size of
    D_trainR
  • nc of examples classified by rule correctly

58
Measure performance of a rule (2)
  • m-estimate of accuracy
  • p The prior probability that an instance,
    randomly drawn from the entire
    dataset, will have the classification assigned by
    rule R
  • ?p is the prior assumed accuracy (?, 100??
    example ? 12?? example? ????? p0.12)
  • m A weight that indicates how much the prior
    probability p
  • influences the rule performance measure
  • - If m0, then m-estimate becomes the
    relative frequency measure
  • - As m increases, a larger number of
    instances is needed to override
  • the prior assumed accuracy p

59
Measure performance of a rule (3)
  • Entropy measure
  • c The number of possible values (i.e., classes)
    of the target
  • attribute
  • pi The proportion of instances in D_trainR for
    which the target attribute takes on the i-th
    value (i.e., class)

60
Sequential covering algorithms Issues
  • Reduce the (more difficult) problem of learning a
    disjunctive set of rules to a sequence of
    (simpler) problems, each is to learn a single
    conjunctive rule
  • After a rule is learned (i.e., found) the
    training instances covered(classified) by the
    rule are removed from the training set
  • Each rule may not be treated as independent of
    other rules
  • Perform a greedy search (for finding a sequence
    of rules) without backtracking
  • Not guaranteed to find the smallest set of rules
  • Not guaranteed to find the best set of rules

61
Learning First-Order Rules
  • From now . . .
  • We consider learning rule that contain variables
    (first-order rules)
  • Inductive learning of first-order rules
    inductive logic programming (ILP)
  • Can be viewed as automatically inferring Prolog
    programs
  • Two methods are considered
  • FOIL
  • Induction as inverted deduction

62
Learning First-Order Rules (cont.)
  • First-order rule
  • Rules that contain variables
  • Example
  • Ancestor (x, y) ? Parent (x, y).
  • Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
    y) recursive
  • More expressive than propositional rules
  • IF (Father1 Bob) (Name2 Bob) (Female1
    True),
  • THEN Daughter1,2 True
  • IF Father(y,x) Female(y), THEN Daughter(x,y)

63
Learning First-Order Rules (cont.)
  • Formal definitions in first-order logic
  • Constants e.g., John, Kansas, 42
  • Variables e.g., Name, State, x
  • Predicates e.g., Male, as in Male(John)
  • Functions e.g., age,cosine as in,age(Gunho),
    cosine(x)
  • Term constant, variable, or function(term)
  • Literal is a predicate (or its negation) applied
    to a set of terms, e.g., Greater_Than(age(John),20
    ), Male(x), etc.
  • A first-order rule is a Horn clause
  • H and Li(i1..n) are literals
  • Clause disjunction of literals with implicit
    universal quantification
  • Horn clause at most one positive literal
  • (H ? ?L1 ? ?L2 ? ? ?Ln)

64
Learning First-Order Rules (cont.)
  • First Order Horn Clauses
  • Rules that have one or more preconditions and one
    single consequent. Predicates may have variables
  • The following Horn clause is equivalent
  • H ? ?L1 ? ? ? Ln
  • H ? (L1 ? ? Ln )
  • IF (L1 ? Ln), THEN H

65
Learning Sets of First-Order Rules FOIL
  • First-Order Inductive Learning (FOIL)
  • Natural extension of Sequential covering
    Learn-one-rule
  • FOIL rule similar to Horn clause with two
    exceptions
  • Syntactic restriction no function
  • More expressive than Horn clauses
  • Negation allowed in rule bodies

66
Learning a Single Rule in FOIL
  • Top-down approach originally applied to
    first-order logic (Quinlan, 1990).
  • Basic algorithm for instances with
    discrete-valued features
  • Let A (set of rule antecedents)
  • Let N be the set of negative examples
  • Let P the current set of uncovered positive
    examples
  • Until N is empty do
  • For every feature-value pair (literal)
    (FiVij) calculate
  • Gain(FiVij, P, N)
  • Pick literal, L, with highest gain.
  • Add L to A.
  • Remove from N any examples that do not
    satisfy L.
  • Remove from P any examples that do not
    satisfy L.
  • Return the rule A1 ?A2 ? ?An ? Positive

67
Learning first-order rules FOIL alg.
68
Sequential (set) covering algorithms
  • CN2 Algorithm
  • Start from an empty conjunct
  • Add conjuncts that minimizes the entropy measure
    A, A,B,
  • Determine the rule consequent by taking majority
    class of instances covered by the rule
  • RIPPER Algorithm
  • Start from an empty rule R0 gt class(initial
    rule)
  • Add conjuncts that maximizes FOILs information
    gain measure
  • R1 A gt class (rule after adding conjunct)
  • t number of positive instances covered by both
    R0 and R1
  • p0 number of positive instances covered by R0
  • n0 number of negative instances covered by R0
  • p1 number of positive instances covered by R1
  • n1 number of negative instances covered by R1

69
Foil Gain Metric
  • Want to achieve two goals
  • Decrease coverage of negative examples
  • Measure increase in percentage of positives
    covered when literal is added to the rule.
  • Maintain coverage of as many positives as
    possible.
  • Count number of positives covered.

70
Foil_Gain measure
  • R0 gt class (initial rule)
  • R1 A gt class (rule after adding conjunct)
  • t number of positive instances covered by both
    R0 and R1
  • p0 number of positive instances covered by R0
  • n0 number of negative instances covered by R0
  • p1 number of positive instances covered by R1
  • n1 number of negative instances covered by R1

71
Example Foil_Gain measure
  • R0 ? (initial rule)
  • R1 A ? class (rule after adding conjunct)
  • R2 B ? class (rule after adding conjunct)
  • R3 C ? class (rule after adding conjunct)
  • Assume the initial rule is gt.
  • This rule covers p0 100 positive examples and
    n0 400 negative examples.
  • Choose one rule !!
  • R1 covers 4 positive examples and 1 negative
    example.
  • R2 covers 30 positive examples and 10 negative
    example.
  • R3 covers 100 positive examples and 90 negative
    examples.

72
Example Foil_Gain measure
  • R0 gt . (initial rule)
  • This rule covers 100 positive examples and
  • 400 negative examples.
  • R1 A gt class (rule after adding conjunct)
  • R1 covers 4 positive examples and 1 negative
    example.
  • t 4, number of positive instances covered by
    both R0 and R1
  • p0 100
  • n0 400
  • p1 4
  • n1 1

73
Example Foil_Gain measure
  • R0 ? . (initial rule)
  • This rule covers 100 positive examples
    and
  • 400 negative examples.
  • R2 B ? (rule after adding conjunct)
  • R2 covers 30 positive examples and 10 negative
    example..
  • t 30, number of positive instances covered by
    both R0 and R1
  • p0 100
  • n0 400
  • p1 30
  • n1 10

74
Example Foil_Gain measure
  • R0 gt . (initial rule)
  • This rule covers 100 positive examples
    and
  • 400 negative examples.
  • R3 C gt (rule after adding conjunct)
  • R3 covers 100 positive examples and 90 negative
    example..
  • t 100, number of positive instances covered by
    both R0 and R1
  • p0 100
  • n0 400
  • p1 100
  • n1 90

75
Example
  • Training data assertions
  • GrandDaughter(Victor, Sharon)
  • Female(Sharon) Father(Sharon,Bob)
  • Father(Tom, Bob) Father(Bob, Victor)
  • Use closed world assumption any literal
    involving the specified predicates and literals
    that is not listed is assumed to be false
  • ?GrandDaughter(Tom,Bob) ?GrandDaughter(Tom,Tom)
  • ?GrandDaughter(Bob,Victor)
  • ?Female(Tom) etc.

76
Possible Variable Bindings
  • Initial rule
  • GrandDaughter(x,y)
  • Possible bindings from training assertions (how
    many possible bindings of 4 literals to initial
    rule?)
  • Positive binding x/Victor, y/Sharon
  • Negative bindings x/Victor, y/Victor
  • x/Tom, y/Sharon, etc.
  • Positive bindings provide positive evidence and
    negative bindings provide negative evidence
    against the rule under consideration.

77
Rule Pruning in FOIL
  • Prepruning method based on minimum description
    length (MDL) principle.
  • Postpruning to eliminate unnecessary complexity
    due to limitations of greedy algorithm.
  • For each rule, R
  • For each antecedent, A, of rule
  • If deleting A from R does not
    cause
  • negatives to become covered
  • then delete A
  • For each rule, R
  • If deleting R does not uncover any
    positives (since they
  • are redundantly covered by other
    rules)
  • then delete R

78
Sequential (set) covering algorithms
  • CN2 Algorithm
  • Start from an empty conjunct
  • Add conjuncts that minimizes the entropy measure
    A, A,B,
  • Determine the rule consequent by taking majority
    class of instances covered by the rule
  • RIPPER Algorithm
  • Start from an empty rule R0 gt class(initial
    rule)
  • Add conjuncts that maximizes FOILs information
    gain measure
  • R1 A gt class (rule after adding conjunct)
  • t number of positive instances covered by both
    R0 and R1
  • p0 number of positive instances covered by R0
  • n0 number of negative instances covered by R0
  • p1 number of positive instances covered by R1
  • n1 number of negative instances covered by R1

79
General to Specific Beam Search
  • Learning with decision tree

80
General to Specific Beam Search 4
The CN2-Algorithm
LearnOneRule( target_attribute, attributes,
examples, k ) Initialise best_hypothesis Ø ,
the most general hypothesis Initialise
candidate_hypotheses ? best_hypothesis while
( candidate_hypothesis is not empty ) do 1.
Generate the next more-specific
candidate_hypothesis 2. Update
best_hypothesis 3. Update candidate_hypothesis
return a rule of the form IF best_hypothesis
THEN prediction where prediction is the most
frequent value of target_attribute among those
examples that match best_hypothesis. Performance
( h, examples, target_attribute ) h_examples
the subset of examples that match h return
-Entropy( h_examples ), where Entropy is with
respect to target_attribute
81
General to Specific Beam Search 5
  • Generate the next more specific
    candidate_hypothesis

all_constraints set of all constraints (a
v), where a Î attributes and v is a value
of a occuring in the current set of
examples new_candidate_hypothesis for each h
in candidate_hypotheses, for each c
in all_constraints Create
a specialisation of h by adding the constraint c
Remove from new_candidate_hypothesis
any hypotheses which are
duplicate, inconsistent or not maximally specific
  • Update best_hypothesis

for all h in new_candidate_hypothesis do if
statistically significant when tested on
examples Performance( h, examples,
target_attribute ) gt
Performance( best_hypothesis, examples,
target_attribute ) )
then best_hypothesis h
82
General to Specific Beam Search 6
  • Update the candidate-hypothesis

candidate_hypothesis the k best members of
new_candidate_hypothesis, according to
Performance function
  • Performance function guides the search in the
    Learn-One -Rule
  • s the current set of training examples
  • c the number of possible values of the target
    attribute
  • part of the examples, which are classified
    with the ith. value

83
Example for CN2-Algorithm
LearnOneRule(EnjoySport, Sky, AirTemp, Humidity,
Wind, Water, Forecast, EnjoySport, examples, 2)
best_hypothesis Ø candidate_hypotheses
Ø all_constraints SkySunny, SkyRainy,
AirTempWarm, AirTempCold,
HumidityNormal, HumidityHigh,
WindStrong,
WaterWarm, WaterCool,
ForecastSame, ForecastChange Performance
nc / n n Number of examples, covered by the
rule nc Number of examples covered by the rule
and classification is correct
84
Example for CN2-Algorithm (2)
Pass 1 Remove delivers no result
candidate_hypotheses SkySunny,
AirTempWarm best_hypothesis is SkySunny
85
Example for CN2-Algorithm (3)
Pass 2 Remove (duplicate, inconsistent, not
maximally specific)
candidate_hypotheses SkySunny AND
AirTempWarm, SkySunny AND HumidityHigh best_h
ypothesis remains SkySunny
86
Relational Learning andInductive Logic
Programming (ILP)
  • Fixed feature vectors are a very limited
    representation of instances.
  • Examples or target concept may require relational
    representation that includes multiple entities
    with relationships between them (e.g. a graph
    with labeled edges and nodes).
  • First-order predicate logic is a more powerful
    representation for handling such relational
    descriptions.
  • Horn clauses (i.e. if-then rules in predicate
    logic, Prolog programs) are a useful restriction
    on full first-order logic that allows decidable
    inference.
  • Allows learning programs from sample I/O pairs.

87
ILP Examples
  • Learn definitions of family relationships given
    data for primitive types and relations.
  • parent(A,B) - brother(A,C), parent(C,B).
  • uncle(A,B) - husband(A,C), sister(C,D),
    parent(D,B).
  • Learn recursive list programs from I/O pairs.
  • member(X, X Y).
  • member(X, Y Z) - member(X,Z).
  • append( ,L,L).
  • append(XL1,L2,XL12)- append(L1,L2,L12).

88
ILP
  • Goal is to induce a Horn-clause definition for
    some target predicate P, given definitions of a
    set of background predicates Q.
  • Goal is to find a syntactically simple
    Horn-clause definition, D, for P given background
    knowledge B defining the background predicates Q.
  • For every positive example pi of P
  • For every negative example ni of P
  • Background definitions are provided either
  • Extensionally List of ground tuples satisfying
    the predicate.
  • Intensionally Prolog definitions of the
    predicate.

89
ILP Systems
  • Top-Down
  • FOIL (Quinlan, 1990)
  • Bottom-Up
  • CIGOL (Muggleton Buntine, 1988)
  • GOLEM (Muggleton, 1990)
  • Hybrid
  • CHILLIN (Mooney Zelle, 1994)
  • PROGOL (Muggleton, 1995)
  • ALEPH (Srinivasan, 2000)

90
FOILFirst-Order Inductive Logic
  • Top-down sequential covering algorithm upgraded
    to learn Prolog clauses, but without logical
    functions.
  • Background knowledge must be provided
    extensionally.
  • Initialize clause for target predicate P to
  • P(X1,.XT) -.
  • Possible specializations of a clause include
    adding all possible literals
  • Qi(V1,,VTi)
  • not(Qi(V1,,VTi))
  • Xi Xj
  • not(Xi Xj)
  • where Xs are bound variables already in
    the existing clause at least one of V1,,VTi is
    a bound variable, others can be new.
  • Allow recursive literals P(V1,,VT) if they do
    not cause an infinite regress.
  • Handle alternative possible values of new
    intermediate variables by maintaining examples as
    tuples of all variable values.

91
FOIL Training Data
  • For learning a recursive definition, the positive
    set must consist of all tuples of constants that
    satisfy the target predicate, given some fixed
    universe of constants.
  • Background knowledge consists of complete set of
    tuples for each background predicate for this
    universe.
  • Example Consider learning a definition for the
    target predicate path for finding a path in a
    directed acyclic graph.
  • path(X,Y) - edge(X,Y).
  • path(X,Y) - edge(X,Z), path(Z,Y).

Horn-clause definition, D,
some target predicate P
Background Knowledge, B edge lt1,2gt,lt1,3gt,lt3,6gt,
lt4,2gt,lt4,6gt,lt6,5gt path lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,
lt3,6gt,lt3,5gt, lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
92
FOIL Negative Training Data
  • Negative examples of target predicate can be
    provided directly, or generated indirectly by
    making a closed world assumption.
  • Every pair of constants ltX,Ygt not in positive
    tuples for path predicate.

Negative path tuples path(X, Y)? ltX,
Ygt lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2,6
gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4gt,lt5,1
gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,lt6,2gt,lt6,3
gt, lt6,4gt,lt6,6gt
93
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Start with clause path(X,Y)-. Possible
literals to add edge(X,X),edge(Y,Y),edge(X,Y),edg
e(Y,X),edge(X,Z), edge(Y,Z),edge(Z,X),edge(Z,Y),p
ath(X,X),path(Y,Y), path(X,Y),path(Y,X),path(X,Z),
path(Y,Z),path(Z,X), path(Z,Y),XY, plus
negations of all of these.
94
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,X).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 6 negative examples
Not a good literal.
95
Sample FOIL Induction
Pos lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
96
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
Remove covered positive tuples !!.
97
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
path(X,Y)- edge(X,X). Start new
clause path(X,Y)-.
98
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 positive examples
Covers 0 negative examples
Not a good literal.
99
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt,lt2,1gt,lt2,2gt,lt2,3gt,lt2,4gt,lt2,5gt,lt2
,6gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4,1gt,lt4,3gt,lt4,4
gt,lt5,1gt, lt5,2gt,lt5,3gt,lt5,4gt,lt5,5gt,lt5,6gt,lt6,1gt,
lt6,2gt,lt6,3gt, lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 14 of 26 negative examples
Eventually chosen as best possible literal
100
Sample FOIL Induction
Pos lt1,6gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples !!
101
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5gt,lt3,5gt, lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
102
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
103
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
104
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1gt,lt1,4gt, lt3,1gt,lt3,2gt,lt3,3gt,lt3,4gt,lt4
,1gt,lt4,3gt,lt4,4gt, lt6,1gt,lt6,2gt,lt6,3gt,
lt6,4gt,lt6,6gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values !!.
ltX,Y,Zgt
105
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers all 4 positive examples
Covers 15 of 26 negative examples
Eventually chosen as best possible literal
Negatives still covered, remove uncovered
examples. Expand tuples to account for possible Z
values.
106
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Continue specializing clause path(X,Y)-
edge(X,Z).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
107
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z),edge(Z,Y).
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 3 positive examples
Covers 0 negative examples
108
Sample FOIL Induction
Pos lt1,6,2gt,lt1,6,3gt,lt1,5,2gt,lt1,5,3gt,lt3,5,6gt,
lt4,5,2gt,lt4,5,6gt
Neg lt1,1,2gt,lt1,1,3gt,lt1,4,2gt,lt1,4,3gt,lt3,1,6gt,lt3,2
,6gt, lt3,3,6gt,lt3,4,6gt,lt4,1,2gt,lt4,1,6gt,lt4,3,2gt,
lt4,3,6gt lt4,4,2gt,lt4,4,6gt,lt6,1,5gt,lt6,2,5gt,lt6,3,
5gt, lt6,4,5gt,lt6,6,5gt
Test path(X,Y)- edge(X,Z), path(Z,Y).
Covers 4 positive examples
edge lt1,2gt,lt1,3gt,lt3,6gt,lt4,2gt,lt4,6gt,lt6,5gt path
lt1,2gt,lt1,3gt,lt1,6gt,lt1,5gt,lt3,6gt,lt3,5gt,
lt4,2gt,lt4,6gt,lt4,5gt,lt6,5gt
Covers 0 negative examples
Eventually chosen as best literal completes
clause.
Definition complete, since all original ltX,Ygt
tuples are covered (by way of covering some
ltX,Y,Zgt tuple.)
109
Logic Program Induction in FOIL
  • FOIL has also learned
  • append given components and null
  • reverse given append, components, and null
  • quicksort given partition, append, components,
    and null
  • Other programs from the first few chapters of a
    Prolog text.
  • Learning recursive programs in FOIL requires a
    complete set of positive examples for some
    constrained universe of constants, so that a
    recursive call can always be evaluated
    extensionally.
  • For lists, all lists of a limited length composed
    from a small set of constants (e.g. all lists up
    to length 3 using a,b,c).
  • Size of extensional background grows
    combinatorially.
  • Negative examples usually computed using a
    closed-world assumption.
  • Grows combinatorially large for higher arity
    target predicates.
  • Can randomly sample negatives to make tractable.

110
More Realistic Applications
  • Classifying chemical compounds as mutagenic
    (cancer causing) based on their graphical
    molecular structure and chemical background
    knowledge.
  • Classifying web documents based on both the
    content of the page and its links to and from
    other pages with particular content.
  • A web page is a university faculty home page if
  • It contains the words Professor and
    University, and
  • It is pointed to by a page with the word
    faculty, and
  • It points to a page with the words course and
    exam

111
FOIL Limitations
  • Search space of literals (branching factor) can
    become intractable.
  • Use aspects of bottom-up search to limit search.
  • Requires large extensional background
    definitions.
  • Use intensional background via Prolog inference.
  • Hill-climbing search gets stuck at local optima
    and may not even find a consistent clause.
  • Use limited backtracking (beam search)
  • Include determinate literals with zero gain.
  • Use relational pathfinding or relational clichés.
  • Requires complete examples to learn recursive
    definitions.
  • Use intensional interpretation of learned
    recursive clauses.

112
Rule Learning and ILP Summary
  • There are effective methods for learning symbolic
    rules from data using greedy sequential covering
    and top-down or bottom-up search.
  • These methods have been extended to first-order
    logic to learn relational rules and recursive
    Prolog programs.
  • Knowledge represented by rules is generally more
    interpretable by people, allowing human insight
    into what is learned and possible human approval
    and correction of learned knowledge.

113
Induction as Inverted Deduction
  • Induction is finding h such that
  • (?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
  • where
  • xi is the ith training instance
  • f(xi) is the target function value for xi
  • B is other background knowledge
  • So lets design inductive algorithms by inverting
  • operators for automated deduction

114
Induction as Inverted Deduction
  • pairs of people, ltu,vgt such that child of u is
    v,
  • f(xi) Child(Bob,Sharon)
  • xi Male(Bob),Female(Sharon),Father(Sharon,Bob)
  • B Parent(u,v) ? Father(u,v)
  • What satisfies (?lt xi,f(xi)gt ? D) B ? h ? xi
    f(xi) ?
  • h1 Child(u,v) ? Father(v,u) ?
    h1 ? xi with no need B
  • h2 Child(u,v) ? Parent(v,u) ? B ?
    h1 ? xi

D
115
Induction as Inverted Deduction
We have mechanical deductive operators F(A,B)
C, where A ? B C need inductive
operators O(B,D) h where (?ltxi,f(xi)gt
? D) B ? h ? xi f(xi)
116
Induction as Inverted Deduction
  • Positives
  • Subsumes earlier idea of finding h that fits
    training data
  • Domain theory B helps define meaning of fit the
    data
  • B ? h ? xi f(xi)
  • Negatives
  • Doesnt allow for noisy data. Consider
  • (?ltxi,f(xi)gt ? D) B ? h ? xi f(xi)
  • First order logic gives a huge hypothesis space H
  • overfitting
  • intractability of calculating all acceptable
    hs

117
Deduction Resolution Rule
  • C1 P ? L
  • C2 L ? R
  • P ? R
  • 1. Given initial clauses C1 and C2, find a
    literal L from clause C1 such that L occurs in
    clause C2.
  • 2. Form the resolvent C by including all literals
    from C1 and C2, except for L and L.
  • More precisely, the set of literals occurring
    in the conclusion C is
  • C (C1 - L) ? (C2 - L)
  • where ? denotes set union, and - set
    difference.

118
Inverting Resolution
  • C (C1 - L) ? (C2 - L)
  • C1 PassExam ? KnowMaterial C2
    KnowMaterial ? Study
  • C
    PassExam ? Study

119
Inverted Resolution (Propositional)
  • Given initial clauses C1 and C, find a literal L
    that occurs in clause C1, but not in clause C.
  • Form the second clause C2 by including the
  • following literals
  • C2 (C - (C1 - L)) ? L
  • C1 PassExam ? KnowMaterial C2
    KnowMaterial ? Study

C PassExam ? Study
120
Inverted Resolution
  • First Order Resolution
  • substitution
  • to be any mapping of variables to terms.
  • ex) ? x / Bob, y / z
  • Unifying Substitution
  • For two literal L1 and L2, provided L1? L2?
  • Ex) ? x/Bill, z/y
  • L1Father(x, y), L2Father(Bill, z)
  • L1? L2? Father(Bill, y)

121
Inverted Resolution
  • First Order Resolution
  • Resolution Operator (first-order form)
  • Find a literal L1 from clause C1, literal L2 from
    clause C2, and substitution ? such that L1?
    ?L2?.
  • From the resolvent C by including all literals
    from C1? and C2?, except for L1? and ?L2?. More
    precisely, the set of literals occurring in the
    conclusion C is
  • C (C1 - L1)? ? (C2 - L2)?

122
Inverted Resolution
  • First Order Resolution
  • Example
  • C1 White(x) ? Swan(x), C2 Swan(Fred)
  • C1 White(x)??Swan(x),
  • ? L1?Swan(x), L2Swan(Fred)
  • unifying substitution ? x/Fred
  • then L1? ?L2? ?Swan(Fred)
  • (C1-L1)? White(Fred)
  • (C2-L2)? Ø
  • ? C White(Fred)

C (C1 - L1)? ? (C2 - L2)?
123
First Order Resolution
  • 1. Find a literal L1 from clause C1 , literal L2
    from clause C2, and substitution ? such that
  • L1?1
    L2?2
  • 2. Form the resolvent C by including all literals
    from C1? and C2?, except for L1 theta and L2?.
    More precisely, the set of literals occuring in
    the conclusion is
  • C (C1 - L1)?1 ? (C2 -
    L2 )?2
  • ? C - (C1 - L1)?1 (C2 - L2 )?2
  • ? C - (C1 - L1)?1 L2 ?2
    (C2)?2
  • Inverting
  • C2 ( C - (C1 - L1) ?1 ) ?2-1 ?L1?1 ?2-1

124
First Order Resolution
  • Inverse Resolution First-order case
  • C(C1-L1)?1?(C2-L2)?2
  • (where, ? ?1?2 (factorization))
  • C - (C1-L1)?1 (C2-L2)?2
  • (where, L2 ?L1?1?2-1 )
  • ? C2(C-(C1-L1)?1)?2-1??L1?1?2-1

125
Inverting Resolution (cont.)
  • Inverse Resolution First-order case
  • We whish to learn rules for GrandChild(y,x),
    target predicate
  • Given training data D GrandChild(Bob,Shan
    non)
  • Background info. B
    Father(Shannon,Tom), Father(Tom,Bob)

  • CGrandChild(Bob,Shannon)

  • C1Father(Shannon,Tom)

  • L1Father(Shannon,Tom)
  • Suppose we choose inverse substitution
  • ?1-1, ?2-1Shannon/x)
  • (C-(C1-L1)?1) ?2-1 (C)?2-1
    GrandChild(Bob,x)
  • ?L1?1?2-1 ?Father(x,Tom)
  • ? C2 GrandChild(Bob,x) ??Father(x,Tom)
  • or equivalently GrandChild(Bob,x) ?
    Father(x,Tom)

C2(C-(C1-L1)?1)?2-1??L1?1?2-1
126
Rule Learning Issues
  • Which is better rules or trees?
  • Trees share structure between disjuncts.
  • Rules allow completely independent features in
    each disjunct.
  • Mapping some rules sets to decision trees results
    in an exponential increase in size.

A ? B ? P C ? D ? P
What if add rule E ? F ? P ??
127
Rule Learning Issues
  • Which is better top-down or bottom-up search?
  • Bottom-up is more subject to noise, e.g. the
    random seeds that are chosen may be noisy.
Write a Comment
User Comments (0)
About PowerShow.com