Logic, Language and Learning - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Logic, Language and Learning

Description:

this is an operational point of view because there are many deductive operators ... lgg of literals (= atoms or negated atoms) : lgg(atom1,atom2) = see above ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 56

Provided by: profdrlu

Category:

more less

Transcript and Presenter's Notes

Title: Logic, Language and Learning

1
Logic, Language and Learning

Chapter 14 Generality
Luc De Raedt

2
Generality in logic

Two ways of seeing generality
Various frameworks for generality
theta-subsumption and its variants
relative subsumption
(inverse) resolution

3
Generality in logic
4
Example
5
G S

S follows deductively from G
G follows inductively from S
therefore induction is the inverse of deduction
this is an operational point of view because
there are many deductive operators - that
implement
take any deductive operator and invert it and one
obtains an inductive operator

6
Various frameworks for generality

Depending on the form of G and S
single clause
clausal theory
full first order theory
Depending on the choice of - to invert
theta subsumption (most popular !)
implication
resolution

7
Subsumption in Propositional logic

Clause g subsumes clause s
if and only g s
or, equivalently
g ? s
pos - p,q,r pos - p,q,r,s,t
because
pos, p, q,r ? pos, p, q,r, s,t

8
Subsumption in propositional logic
pos
pos -p pos -q pos -r
pos -p,q pos- p,r pos -q,r
pos - p,q,r
9
Subsumption in propositional logic

Perfect structure
Complete lattice
any two clauses have unique
least upper bound (least general generalization)
greatest lower bound
No syntactic variants
Easy specialization, generalization

10
Operators

Identical as for item-sets/monomials
Specialization operator
Generalization operator

11
Ref. Operators for propositional clauses

Specialization operator
Generalization operator

12
Subsumption in logical atoms

g subsumes s if and only if there is a
substiution ? such that g? s
e.g. p(X,Y,X) subsumes p(a,Y,a)
e.g. p(f(X),Y) subsumes p(f(a),Y)

13
Subsumption insimple logical atoms
P(X,Y,Z)
P(a,Y,Z) ... P(X,b,Z) ... P(X,Y,c)
P(a,b,Z) P(a,Y,c) ... P(X,b,c)
P(a,b,c)
14
Subsumption insimple logical atoms
P(X,Y)
P(X,X) ... P(a,Y) P(b,Y) P(X,a) P(X,b)
P(a,a) P(a,b) ... P(b,b) ...
15
Subsumption inlogical atoms
P(X)
P(f(Y)) ... P(g(Y)) ... P(h(Y,Z)) ...
P(f(f(W)) P(f(g(W))) P(f(f(f(U))))
P(f(f(f(f(V)))) ...
16
Subsumption in logical atoms

g subsumes s if and only if there is a
substitution ? such that g? s
Still nice properties and complete lattice up to
variable renaming
p(X,a) and p(U,a)
greatest lower bound unification
unification p(X,a) and p(b,U) gives p(b,a)
least upper bound anti-unification lgg
lgg p(X,a,b) and p(c,a,d) p(X,a,Y)
lgg p(X,f(X,c)) and p(a,f(a,Y)) gives p(U,f(U,T))

17
Lgg of atoms

lgg of terms
lgg(t,t) t
lgg(f(s1, , sn), f(t1, , tn))
f(lgg(s1,t1), , lgg(sn,tn))
lgg(f(s1, , sn), g(t1, , tm)) V (throughout)
lgg of atoms
lgg(p(s1, , sn), p(t1, , tn))
p(lgg(s1,t1), , lgg(sn,tn))
lgg(p(s1, , sn), q(t1, , tm)) undefined

18
Operators

Ideal Specialization operator
apply a substitution X / Y where X,Y already
appear in atom
apply a substitution X / f(Y1, , Yn) where
Yi new variables
apply a substitution X / c where c is a
constant
Ideal Generalization operator
apply an inverse substitution
Inverse substitution substitutes terms at
specified places by variables
Invert one of the specialization steps above
Replace some (but not all) occurences of a
variable X by a different variable Y
Replace all terms f(Y1,...,Yn) where Yi are
distinct by a new variable X
Replace some occurences of a constant by a new
variable

19
Ideal Specialization Operator
20
Optimal Specialization Operator
21
(No Transcript)
22
Inverting substitutions
23
(No Transcript)
24
Operators

Generalization
turn term into variable
p(a,f(b)) becomes p(X,f(b)) or p(a,f(X))
Inv. Substitutions lt1gt / X or lt2,1gt / X
p(a,a) becomes p(X,X) or p(a,X) or p(X,a)
Inv. Substitutions lt1gt / X , lt2gt / X , lt2gt
/ X , lt1gt / X
They all invert X / a
replace two occurences of variable X into X1 and
X2
p(X,X) becomes p(X1,X2)

25
Theta-subsumption (Plotkin 70)

Most important framework for inductive logic
programming. Used by all major ILP systems.
S and G are single clauses
Combines propositional subsumption and
subsumption on logical atoms
c1 theta-subsumes c2 if and only if there is a
substitution ? such that c1 ? ? c2
c1 father(X,Y) - parent(X,Y),male(X)
c2 father(jef,paul) - parent(jef,paul),
parent(jef,an), male(jef), female(an)
? X / jef, Y /paul

d1 p(X,Y) - q(X,Y), q(Y,X)
d2 p(Z,Z) - q(Z,Z)
d3 p(a,a) - q(a,a)
theta(1,2) X / Z, Y /Z
theta(2,3) Z/a
d1 is a generalization of d3
Mapping several literals onto one leads
(sometimes) to combinatorial problems

27
Properties

Soundness if c1 theta-subsumes c2 then
c1 c2
Incompleteness (but only for self-recursive
clauses) wrt logical entailment
c1 p(f(X)) - p(X)
c2 p(f(f(Y))) - p(Y)
Decidable (but NP-complete)
transitive and reflexive but not anti-symmetric

28
Structure
p(X,Y) - m(X,Y) p(X,Y) - m(X,Y), m(X,Z) p(X,Y)
- m(X,Y), m(X,Z), m(X,U) ...
lgg
p(X,Y) - m(X,Y),s(X) p(X,Y) - m(X,Y),
m(X,Z),s(X) ...
p(X,Y) - m(X,Y),r(X) p(X,Y) - m(X,Y),
m(X,Z),r(X) ...
p(X,Y) - m(X,Y),s(X),r(X) p(X,Y) - m(X,Y),
m(X,Z),s(X),r(X) ...
glb
reduced
29
Properties (2)

Equivalence classes c
parent(X,Y) - mother(X,Y), mother(X,Z)
parent(X,Y) - mother(X,Y)
c1 reduced clause of c2 iff c1 minimal subset of
literals of c2 that is equivalent with c2
parent(X,Y) - mother(X,Y), mother(X,Z)
parent(X,Y) - mother(X,Y) reduced form
this gives an algorithm for reduction
reduced class representative of equivalence
class, unique up to variable renaming

30
(No Transcript)
31
Properties (3)

Equivalence classes induce a lattice L
any two equivalence classes have least upper
bound (least general generalization - lgg)
any two equivalence classes have greatest lower
bound
infinite descending and ascending chains exist,
e.g.
- p(X1,X2),p(X2,X1)
- p(X1,X2),p(X2,X1), p(X1,X3),p(X3,X1),p(X2,X3),p
(X3,X2)
- p(Xi,Xj) for which i\j and i and j between
1 and n
.
- p(X1,X1)

32
Lgg of clauses

lgg of literals ( atoms or negated atoms)
lgg(atom1,atom2) see above
lgg(not atom1, not atom2) not lgg(atom1, atom2)
lgg(not atom1, atom2) undefined
lgg of clauses
lgg( l1, lm, k1, , kn) lgg(li,kj)
lgg(li,kj) defined
f(t,a) - p(t,a), m(t), f(a)
f(j,p) - p(j,p), m(j), m(p)
lgg f(X,Y) - p(X,Y), m(X), m(Z)

33
Refinement operators

In general,
optimal and ideal operators do not exist for
theta-subsumption
due to the infinite ascending chains, the result
may not be finite (and can therefore not be
computed)

34
Generalization operator

On single clause
should return all proper minimal generalizations
of given clause
problematic for some clauses
e.g. h(X,X) - p(X,X)
infinite clauses !
Bound the size of clauses
better to start from two clauses and apply lgg

35
Generalization operator

On single clause
should return all proper minimal generalizations
of given clause
problematic for some clauses
e.g. - p(X,X)
infinite clause !
Ideal generalization operator does not exist.
Similarly for specialization (but more
complicated)
Ideal operator does not exist
Some clauses have an infinite number of proper
minimal specializations

36
Specialization Operators

Pragmatic solution
rho(c ) c c is a maximally general
specialization of c (theory)
rho(c ) ?? c U l l is literal U c? ?
is a substitution (practice)
rho(parent(X,Y)) includes
parent(X,X)
parent(X,Y) - male(X)
parent(X,Y) - parent(Y,Z),
.?

37
d daughter, p parent, f female, m male
d(X,Y)
d(X,Y) -p(X,Z)
d(X,X)
d(X,Y) - p(Y,X)
d(X,Y) - f(X)
d(X,Y) - f(X), p(X,Y)
d(X,Y)-f(X),f(Y)
38
Variants of theta-subsumption

Inverting implication
to resolve the incompleteness of
theta-subsumption w.r.t. entailment
OI subsumption
to resolve the problems w.r.t. the syntactic
variants, the non-existence of ideal operators

39
OI subsumption
40
(No Transcript)
41
Inverting implication

Framework addresses incompleteness of
theta-subsumption (Muggleton, AIJto appear
Idestam-Almquist JAIR)
main issue find an lgg under implication
c p(f(X)) - p(X)
d p(f(f(X))) - p(X)
c does not theta subsume d but c d
lgg(c,d) under implication not unique
p(f(X)) - p(X)
p(f(f(X)) - p(Y)
Learning (recursive) clauses from few examples
computationally expensive

42
Relative generalization

Using background theory in the generality
relation
B a set of definite clauses
g and s single clauses

again various frameworks exist due to choice of
- to implement

43
Basic ideas

bottom clause
the most specific clause covering a specific
clause w.r.t. B
least general generalization relative to the
background theory, the rlgg

44
Bottom clauses
45
Algorithm
46
Relative lgg
47
Relative lgg (Plotkin 71)

Relative to background theory B (here B is a set
of ground facts, the model of the background
theory)
B may be computed from Program
let e1 and e2 be two facts
rlgg(e1,e2) lgg(e1 - B, e2 - B)
the basis of the Golem system (Muggleton and
Feng, ALT90)

48
Example RLGG

Let
e1 fa(t,a)
e2 fa(j,p)
B p(t,a), m(t), f(a), p(j,p), m(j), m(p)
all true facts
Then H rlgg (e1,e2)
lgg(fa(t,a) - p(t,a), m(t), f(a), p(j,p), m(j),
m(p)
fa(j,p) - p(t,a), m(t), f(a), p(j,p),
m(j), m(p) )
fa(Vtj,Vap) - p(t,a), m(t), f(a), p(j,p),
m(j), m(p),
p(Vtj,Vap), m(Vtj), m(Vtp),
p(Vjt,Vpa),
m(Vjt), m(Vjp), m(Vpt),
m(Vpj)

49
Simplify RLGG

Rlgg will be used relative to B to check
coverage, I.e. H and B e
therefore B in H is redundant and can be deleted
reduction with respect to Background B
gives H
fa(Vtj,Vap) - p(Vtj,Vap), m(Vtj), m(Vtp),
p(Vjt,Vpa),
m(Vjt), m(Vjp), m(Vpt),
m(Vpj)
further reduction gives (according to theta
subsumption)
fa(Vtj,Vap) - p(Vtj,Vap), m(Vtj)
what was wanted

50
Simplify RLGG

RLGG is often used in simplified form
using BIAS
Bias is anything which influences the learning
process and which is not justified by the data
Here syntactic restrictions on clauses to be
induced
rlgg(e1,e2) then becomes
lgg(e1 - bias(e1,B) e2 - bias(e2,B))
where bias would compute relevant literals in B
given the arguments of ei
e.g. in previous example
bias(fa(j,p),B) p(j,p), m(j), m(p)
bias would be a bias of the system
used by Golem (Muggleton and Feng, ALT90)

51
Inverting Resolution

G and S are sets of clauses and - is resolution
Absorption
from q - A and p - A,B
infer p - q, B
Identification
from p - A, B and p- q,B
infer q - A

p -q,B q- A
p - A,B
52
Inverting Resolution
q-B p-A,q q -C

Intra construction
from p - A,B and p - A,C
infer q - B and p - A,q and q - C
Inter construction
from p - A,B and q - A,C
infer p - r,B and r- A and q - r, C
Invent new predicates
apply intra construction on
grandparent(X,Y) - father(X,Z), father(Z,Y)
grandparent(X,Y) - father(X,Z), mother(Z,Y)

p-A,B p-A,C
53
Problems

Results inverse resolution not unique
father(j,p) - male(j)
parent(j,p)
gives
father(j,p) - male(j), parent(j,p)
or
father(X,Y) - male(X), parent(X,Y)
by inverse resolution

54
Example inverse resolution
m(j)
f(X,Y) - p(X,Y),m(X)
f(j,Y) - p(j,Y)
p(j,m)
f(j,m)
55
grandparent(X,Y) - father(X,Z), parent(Z,Y)
father(X,Y) - male(X), parent(X,Y)
grandparent(X,Y) - male(X), parent(X,Z),
parent(Z,Y)
male(jef)
grandparent(jef,Y) - parent(jef,Z),parent(Z,Y)
parent(jef,an)
grandparent(jef,Y) - parent(an,Y)
parent(an,paul)
grandparent(jef,paul)
56
Inverse resolution and bottom clauses

Production of the bottom clause can be regarded
as repeated application of inverse resolution
consider
pos(X) - red(X), square(X)
polygon(X) - square(X)
inverse resolution could give
pos(X) - red(X), polygon(X) but also
pos(X) - red(X), polygon(X), square(X)
applying last choice systematically and
repeatedly will result in bottom clause.

57
Conclusions