Title: Logic, Language and Learning
1Logic, Language and Learning
- Chapter 14 Generality
- Luc De Raedt
2Generality in logic
- Two ways of seeing generality
- Various frameworks for generality
- theta-subsumption and its variants
- relative subsumption
- (inverse) resolution
3Generality in logic
4Example
5G S
- S follows deductively from G
- G follows inductively from S
- therefore induction is the inverse of deduction
- this is an operational point of view because
there are many deductive operators - that
implement - take any deductive operator and invert it and one
obtains an inductive operator
6Various frameworks for generality
- Depending on the form of G and S
- single clause
- clausal theory
- full first order theory
- Depending on the choice of - to invert
- theta subsumption (most popular !)
- implication
- resolution
7Subsumption in Propositional logic
- Clause g subsumes clause s
- if and only g s
- or, equivalently
- g ? s
- pos - p,q,r pos - p,q,r,s,t
- because
- pos, p, q,r ? pos, p, q,r, s,t
8Subsumption in propositional logic
pos
pos -p pos -q pos -r
pos -p,q pos- p,r pos -q,r
pos - p,q,r
9Subsumption in propositional logic
- Perfect structure
- Complete lattice
- any two clauses have unique
- least upper bound (least general generalization)
- greatest lower bound
- No syntactic variants
- Easy specialization, generalization
10Operators
- Identical as for item-sets/monomials
- Specialization operator
- Generalization operator
11Ref. Operators for propositional clauses
- Specialization operator
- Generalization operator
12Subsumption in logical atoms
- g subsumes s if and only if there is a
substiution ? such that g? s - e.g. p(X,Y,X) subsumes p(a,Y,a)
- e.g. p(f(X),Y) subsumes p(f(a),Y)
13Subsumption insimple logical atoms
P(X,Y,Z)
P(a,Y,Z) ... P(X,b,Z) ... P(X,Y,c)
P(a,b,Z) P(a,Y,c) ... P(X,b,c)
P(a,b,c)
14Subsumption insimple logical atoms
P(X,Y)
P(X,X) ... P(a,Y) P(b,Y) P(X,a) P(X,b)
P(a,a) P(a,b) ... P(b,b) ...
15Subsumption inlogical atoms
P(X)
P(f(Y)) ... P(g(Y)) ... P(h(Y,Z)) ...
P(f(f(W)) P(f(g(W))) P(f(f(f(U))))
P(f(f(f(f(V)))) ...
16Subsumption in logical atoms
- g subsumes s if and only if there is a
substitution ? such that g? s - Still nice properties and complete lattice up to
variable renaming - p(X,a) and p(U,a)
- greatest lower bound unification
- unification p(X,a) and p(b,U) gives p(b,a)
- least upper bound anti-unification lgg
- lgg p(X,a,b) and p(c,a,d) p(X,a,Y)
- lgg p(X,f(X,c)) and p(a,f(a,Y)) gives p(U,f(U,T))
17Lgg of atoms
- lgg of terms
- lgg(t,t) t
- lgg(f(s1, , sn), f(t1, , tn))
- f(lgg(s1,t1), , lgg(sn,tn))
- lgg(f(s1, , sn), g(t1, , tm)) V (throughout)
- lgg of atoms
- lgg(p(s1, , sn), p(t1, , tn))
- p(lgg(s1,t1), , lgg(sn,tn))
- lgg(p(s1, , sn), q(t1, , tm)) undefined
18Operators
- Ideal Specialization operator
- apply a substitution X / Y where X,Y already
appear in atom - apply a substitution X / f(Y1, , Yn) where
Yi new variables - apply a substitution X / c where c is a
constant - Ideal Generalization operator
- apply an inverse substitution
- Inverse substitution substitutes terms at
specified places by variables - Invert one of the specialization steps above
- Replace some (but not all) occurences of a
variable X by a different variable Y - Replace all terms f(Y1,...,Yn) where Yi are
distinct by a new variable X - Replace some occurences of a constant by a new
variable
19Ideal Specialization Operator
20Optimal Specialization Operator
21(No Transcript)
22Inverting substitutions
23(No Transcript)
24Operators
- Generalization
- turn term into variable
- p(a,f(b)) becomes p(X,f(b)) or p(a,f(X))
- Inv. Substitutions lt1gt / X or lt2,1gt / X
- p(a,a) becomes p(X,X) or p(a,X) or p(X,a)
- Inv. Substitutions lt1gt / X , lt2gt / X , lt2gt
/ X , lt1gt / X - They all invert X / a
- replace two occurences of variable X into X1 and
X2 - p(X,X) becomes p(X1,X2)
25Theta-subsumption (Plotkin 70)
- Most important framework for inductive logic
programming. Used by all major ILP systems. - S and G are single clauses
- Combines propositional subsumption and
subsumption on logical atoms - c1 theta-subsumes c2 if and only if there is a
substitution ? such that c1 ? ? c2 - c1 father(X,Y) - parent(X,Y),male(X)
- c2 father(jef,paul) - parent(jef,paul),
parent(jef,an), male(jef), female(an) - ? X / jef, Y /paul
26- d1 p(X,Y) - q(X,Y), q(Y,X)
- d2 p(Z,Z) - q(Z,Z)
- d3 p(a,a) - q(a,a)
- theta(1,2) X / Z, Y /Z
- theta(2,3) Z/a
- d1 is a generalization of d3
- Mapping several literals onto one leads
(sometimes) to combinatorial problems
27Properties
- Soundness if c1 theta-subsumes c2 then
- c1 c2
- Incompleteness (but only for self-recursive
clauses) wrt logical entailment - c1 p(f(X)) - p(X)
- c2 p(f(f(Y))) - p(Y)
- Decidable (but NP-complete)
- transitive and reflexive but not anti-symmetric
28Structure
p(X,Y) - m(X,Y) p(X,Y) - m(X,Y), m(X,Z) p(X,Y)
- m(X,Y), m(X,Z), m(X,U) ...
lgg
p(X,Y) - m(X,Y),s(X) p(X,Y) - m(X,Y),
m(X,Z),s(X) ...
p(X,Y) - m(X,Y),r(X) p(X,Y) - m(X,Y),
m(X,Z),r(X) ...
p(X,Y) - m(X,Y),s(X),r(X) p(X,Y) - m(X,Y),
m(X,Z),s(X),r(X) ...
glb
reduced
29Properties (2)
- Equivalence classes c
- parent(X,Y) - mother(X,Y), mother(X,Z)
- parent(X,Y) - mother(X,Y)
- c1 reduced clause of c2 iff c1 minimal subset of
literals of c2 that is equivalent with c2 - parent(X,Y) - mother(X,Y), mother(X,Z)
- parent(X,Y) - mother(X,Y) reduced form
- this gives an algorithm for reduction
- reduced class representative of equivalence
class, unique up to variable renaming
30(No Transcript)
31Properties (3)
- Equivalence classes induce a lattice L
- any two equivalence classes have least upper
bound (least general generalization - lgg) - any two equivalence classes have greatest lower
bound - infinite descending and ascending chains exist,
e.g. - - p(X1,X2),p(X2,X1)
- - p(X1,X2),p(X2,X1), p(X1,X3),p(X3,X1),p(X2,X3),p
(X3,X2) - - p(Xi,Xj) for which i\j and i and j between
1 and n - .
- - p(X1,X1)
32Lgg of clauses
- lgg of literals ( atoms or negated atoms)
- lgg(atom1,atom2) see above
- lgg(not atom1, not atom2) not lgg(atom1, atom2)
- lgg(not atom1, atom2) undefined
- lgg of clauses
- lgg( l1, lm, k1, , kn) lgg(li,kj)
lgg(li,kj) defined - f(t,a) - p(t,a), m(t), f(a)
- f(j,p) - p(j,p), m(j), m(p)
- lgg f(X,Y) - p(X,Y), m(X), m(Z)
-
33Refinement operators
- In general,
- optimal and ideal operators do not exist for
theta-subsumption - due to the infinite ascending chains, the result
may not be finite (and can therefore not be
computed)
34Generalization operator
- On single clause
- should return all proper minimal generalizations
of given clause - problematic for some clauses
- e.g. h(X,X) - p(X,X)
- infinite clauses !
- Bound the size of clauses
- better to start from two clauses and apply lgg
35Generalization operator
- On single clause
- should return all proper minimal generalizations
of given clause - problematic for some clauses
- e.g. - p(X,X)
- infinite clause !
- Ideal generalization operator does not exist.
- Similarly for specialization (but more
complicated) - Ideal operator does not exist
- Some clauses have an infinite number of proper
minimal specializations
36Specialization Operators
- Pragmatic solution
- rho(c ) c c is a maximally general
specialization of c (theory) - rho(c ) ?? c U l l is literal U c? ?
is a substitution (practice) - rho(parent(X,Y)) includes
- parent(X,X)
- parent(X,Y) - male(X)
- parent(X,Y) - parent(Y,Z),
- .?
37d daughter, p parent, f female, m male
d(X,Y)
d(X,Y) -p(X,Z)
d(X,X)
d(X,Y) - p(Y,X)
d(X,Y) - f(X)
d(X,Y) - f(X), p(X,Y)
d(X,Y)-f(X),f(Y)
38Variants of theta-subsumption
- Inverting implication
- to resolve the incompleteness of
theta-subsumption w.r.t. entailment - OI subsumption
- to resolve the problems w.r.t. the syntactic
variants, the non-existence of ideal operators
39OI subsumption
40(No Transcript)
41Inverting implication
- Framework addresses incompleteness of
theta-subsumption (Muggleton, AIJto appear
Idestam-Almquist JAIR) - main issue find an lgg under implication
- c p(f(X)) - p(X)
- d p(f(f(X))) - p(X)
- c does not theta subsume d but c d
- lgg(c,d) under implication not unique
- p(f(X)) - p(X)
- p(f(f(X)) - p(Y)
- Learning (recursive) clauses from few examples
- computationally expensive
42Relative generalization
- Using background theory in the generality
relation - B a set of definite clauses
- g and s single clauses
- again various frameworks exist due to choice of
- to implement
43Basic ideas
- bottom clause
- the most specific clause covering a specific
clause w.r.t. B - least general generalization relative to the
background theory, the rlgg
44Bottom clauses
45Algorithm
46Relative lgg
47Relative lgg (Plotkin 71)
- Relative to background theory B (here B is a set
of ground facts, the model of the background
theory) - B may be computed from Program
- let e1 and e2 be two facts
- rlgg(e1,e2) lgg(e1 - B, e2 - B)
- the basis of the Golem system (Muggleton and
Feng, ALT90)
48Example RLGG
- Let
- e1 fa(t,a)
- e2 fa(j,p)
- B p(t,a), m(t), f(a), p(j,p), m(j), m(p)
all true facts - Then H rlgg (e1,e2)
- lgg(fa(t,a) - p(t,a), m(t), f(a), p(j,p), m(j),
m(p) - fa(j,p) - p(t,a), m(t), f(a), p(j,p),
m(j), m(p) ) - fa(Vtj,Vap) - p(t,a), m(t), f(a), p(j,p),
m(j), m(p), - p(Vtj,Vap), m(Vtj), m(Vtp),
p(Vjt,Vpa), - m(Vjt), m(Vjp), m(Vpt),
m(Vpj)
49Simplify RLGG
- Rlgg will be used relative to B to check
coverage, I.e. H and B e - therefore B in H is redundant and can be deleted
- reduction with respect to Background B
- gives H
- fa(Vtj,Vap) - p(Vtj,Vap), m(Vtj), m(Vtp),
p(Vjt,Vpa), - m(Vjt), m(Vjp), m(Vpt),
m(Vpj) - further reduction gives (according to theta
subsumption) - fa(Vtj,Vap) - p(Vtj,Vap), m(Vtj)
- what was wanted
50Simplify RLGG
- RLGG is often used in simplified form
- using BIAS
- Bias is anything which influences the learning
process and which is not justified by the data - Here syntactic restrictions on clauses to be
induced - rlgg(e1,e2) then becomes
- lgg(e1 - bias(e1,B) e2 - bias(e2,B))
- where bias would compute relevant literals in B
- given the arguments of ei
- e.g. in previous example
- bias(fa(j,p),B) p(j,p), m(j), m(p)
- bias would be a bias of the system
- used by Golem (Muggleton and Feng, ALT90)
51Inverting Resolution
- G and S are sets of clauses and - is resolution
- Absorption
- from q - A and p - A,B
- infer p - q, B
- Identification
- from p - A, B and p- q,B
- infer q - A
p -q,B q- A
p - A,B
52Inverting Resolution
q-B p-A,q q -C
- Intra construction
- from p - A,B and p - A,C
- infer q - B and p - A,q and q - C
- Inter construction
- from p - A,B and q - A,C
- infer p - r,B and r- A and q - r, C
- Invent new predicates
- apply intra construction on
- grandparent(X,Y) - father(X,Z), father(Z,Y)
- grandparent(X,Y) - father(X,Z), mother(Z,Y)
p-A,B p-A,C
53Problems
- Results inverse resolution not unique
- father(j,p) - male(j)
- parent(j,p)
- gives
- father(j,p) - male(j), parent(j,p)
- or
- father(X,Y) - male(X), parent(X,Y)
- by inverse resolution
54Example inverse resolution
m(j)
f(X,Y) - p(X,Y),m(X)
f(j,Y) - p(j,Y)
p(j,m)
f(j,m)
55grandparent(X,Y) - father(X,Z), parent(Z,Y)
father(X,Y) - male(X), parent(X,Y)
grandparent(X,Y) - male(X), parent(X,Z),
parent(Z,Y)
male(jef)
grandparent(jef,Y) - parent(jef,Z),parent(Z,Y)
parent(jef,an)
grandparent(jef,Y) - parent(an,Y)
parent(an,paul)
grandparent(jef,paul)
56Inverse resolution and bottom clauses
- Production of the bottom clause can be regarded
as repeated application of inverse resolution - consider
- pos(X) - red(X), square(X)
- polygon(X) - square(X)
- inverse resolution could give
- pos(X) - red(X), polygon(X) but also
- pos(X) - red(X), polygon(X), square(X)
- applying last choice systematically and
repeatedly will result in bottom clause.
57Conclusions
- Many frameworks exist they have different
purposes - Most important
- theta-subsumption
58Not part of the exam
- Section 4.11 (working with borders)
- Sections marked with a in Chapter 5 except for
RLGG