Bayesian Logic Programs

About This Presentation

Title:

Bayesian Logic Programs

Description:

Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland ... and 18 August 2002, Helsinki, Finland. Summer School on Relational Data Mining, ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 73

Provided by: csU2

Learn more at: http://www.cs.umd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian Logic Programs

1
Bayesian Logic Programs
Summer School on Relational Data Mining
17 and 18 August 2002, Helsinki, Finland

Kristian Kersting, Luc De Raedt
Albert-Ludwigs University
Freiburg, Germany

2
Context
Real-world applications
3
Outline

Bayesian Logic Programs
Examples and Language
Semantics and Support Networks
Learning Bayesian Logic Programs
Data Cases
Parameter Estimation
Structural Learning

4
Bayesian Logic Programs
E

Probabilistic models structured using logic
Extend Bayesian networks with notions of objects
and relations
Probability density over (countably) infinitely
many random variables
Flexible discrete-time stochastic processes
Generalize pure Prolog, Bayesian networks,
dynamic Bayesian networks, dynamic Bayesian
multinets, hidden Markov models,...

5
Bayesian Networks
E

One of the successes of AI
State-of-the-art to model uncertainty, in
particular the degree of belief
Advantage Russell, Norvig 96
strict separation of qualitative and
quantitative aspects of the world
Disadvantge Breese, Ngo, Haddawy, Koller, ...
Propositional character, no notion of objects and
relations among them

6
Stud farm (Jensen 96)
E

The colt John has been born recently on a stud
farm.
John suffers from a life threatening hereditary
carried by a recessive gene. The disease is so
serious that John is displaced instantly, and the
stud farm wants the gene out of production, his
parents are taken out of breeding.
What are the probabilities for the remaining
horses to be carriers of the unwanted gene?

7
Bayesian networks Pearl 88
E
Based on the stud farm example Jensen 96
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
8
Bayesian networks Pearl 88
E
Based on the stud farm example Jensen 96
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
(Conditional) Probability distribution
bt_john
P(bt_cecilyaAbt_johnaA)0.1499
P(bt_johnAAbt_annaA)0.6906
P(bt_johnAA)0.9909

9
Bayesian networks (contd.)
E

acyclic graphs
probability distribution over a finite set
of random
variables

10
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
11
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
12
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry
bt_irene
bt_john
13
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry bt_fred, bt_dorothy.
bt_irene bt_eric, bt_gwenn.
bt_john
14
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry bt_fred, bt_dorothy.
bt_irene bt_eric, bt_gwenn.
bt_john bt_henry ,bt_irene.
15
From Bayesian Networks to Bayesian Logic Programs
E
apriori nodes bt_ann. bt_brian. bt_c
ecily. bt_unknown1. bt_unknown1.
aposteriori nodes bt_henry bt_fred, bt
_dorothy. bt_irene bt_eric, bt_gwenn. bt_
fred bt_unknown1, bt_ann. bt_dorothy bt_br
ian, bt_ann. bt_eric bt_brian, bt_cecily.
bt_gwenn bt_unknown2, bt_ann.
bt_john bt_henry, bt_irene.
16
From Bayesian Networks to Bayesian Logic Programs
E
apriori nodes bt(ann). bt(brian). bt
(cecily). bt(unknown1). bt(unknown1).
aposteriori nodes bt(henry) bt(fred),
bt(dorothy). bt(irene) bt(eric), bt(gwenn).
bt(fred) bt(unknown1), bt(ann).
bt(dorothy) bt(brian), bt(ann).
bt(eric) bt(brian), bt(cecily).
bt(gwenn) bt(unknown2), bt(ann).
bt(john) bt(henry), bt(irene).
17
From Bayesian Networks to Bayesian Logic Programs
E
ground facts / apriori bt(ann). bt(brian
). bt(cecily). bt(unkown1). bt(unkown1).
father(unkown1,fred). mother(ann,fred).
father(brian,dorothy). mother(ann, dorothy).
father(brian,eric). mother(cecily,eric).
father(unkown2,gwenn). mother(ann,gwenn).
father(fred,henry). mother(dorothy,henry).
father(eric,irene). mother(gwenn,irene).
father(henry,john). mother(irene,john).
rules / aposteriori bt(X) father(F,
X), bt(F), mother(M,X), bt(M).
18
Dependency graph Bayesian network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
19
Dependency graph Bayesian network
E
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
20
Bayesian Logic Programs- a first definition
E

A BLP consists of
a finite set of Bayesian clauses.
To each clause in a conditional
probability distribution is
associated
Proper random variables LH(B)
graphical structure dependency graph
Quantitative information CPDs

21
Bayesian Logic Programs- Examples
E
pure Prolog
apriori nodes nat(0). aposteriori nodes
nat(s(X)) nat(X).
MC
...
apriori nodes state(0). aposteriori nodes
state(s(Time)) state(Time). output(Time)
state(Time)
HMM
apriori nodes n1(0). aposteriori nodes
n1(s(TimeSlice) n2(TimeSlice). n2(TimeSlice)
n1(TimeSlice). n3(TimeSlice) n1(TimeSlice)
, n2(TimeSlice).
DBN
22
Associated CPDs
E

represent generically the CPD for each ground
instance of the corresponding Bayesian clause.

23
Combining Rules
E

Multiple ground instances of clauses having the
same head atom?

ground facts as before rules bt(X) f
ather(F,X), bt(F).
bt(X) mother(M,X), bt(M).
24
Combining Rules (contd.)
E

Any algorithm which
combines a set of PDFs
into the (combined) PDFs
where
has an empty output if and only if the input is
empty
E.g. noisy-or, regression, ...

P(AB) and P(AC)
CR
P(AB,C)
25
Bayesian Logic Programs- a definition
E

A BLP consists of
a finite set of Bayesian clauses.
To each clause in a conditional
probability distribution is
associated
To each Bayesian predicate p a combining rule
is associated to combine CPDs of multiple
ground instances of clauses having the same head
Proper random variables LH(B)
graphical structure dependency graph
Quantitative information CPDs and CRs

26
Outline

Bayesian Logic Programs
Examples and Language
Semantics and Support Networks
Learning Bayesian Logic Programs
Data Cases
Parameter Estimation
Structural Learning

E
27
Discrete-Time Stochastic Process
E

Family of random variables
over a domain X, where

for each linearization of the partial order
induced by the dependency graph a Bayesian logic
program specifies a discrete-time stochastic
process

28
Theorem of Kolmogorov
E
29
Consistency Conditions
E

Probability measure ,
is represented by a finite Bayesian network
which is a subnetwork of the dependency graph
over LH(B) Support Network

(Elimination Order) All stochastic processes
represented by a Bayesian logic program B specify
the same probability measure over LH(B).

30
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
31
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
32
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
33
Support network
E

Support network of is
the induced subnetwork of
Support network of is
defined as
Computation utilizes And/Or trees

34
Queries using And/Or trees
E

A probabilistic query
?- Q1...,QnE1e1,...,Emem.
asks for the distribution
P(Q1, ..., Qn E1e1, ..., Emem).

?- bt(eric).

35
Consistency Condition (contd.)
E

the dependency graph is acyclic, and
every random variable is influenced by a finite
set of random variables only

36
Relational Character
E
ground facts bt(ann). bt(brian). bt(ce
cily). bt(unknown1). bt(unknown1). father
(unknown1,fred). mother(ann,fred).
father(brian,dorothy). mother(ann, dorothy).
father(brian,eric). mother(cecily,eric).
father(unknown2,gwenn). mother(ann,gwenn).
father(fred,henry). mother(dorothy,henry).
father(eric,irene). mother(gwenn,irene).
father(henry,john). mother(irene,john).
rules bt(X) father(F,X), bt(F), mother(M,X
), bt(M).
37
Bayesian Logic Programs- Summary
E

First order logic extension of Bayesian networks
constants, relations, functors
discrete and continuous random variables
ground atoms random variables
CPDs associated to clauses
Dependency graph
(possibly) infinite
Bayesian network
Generalize dynamic Bayesian networks and definite
clause logic (range-restricted)

38
Applications

Probabilistic, logical
Description and prediction
Regression
Classification
Clustering
Computational Biology
APrIL IST-2001-33053
Web Mining
Query approximation
Planning, ...

39
Other frameworks

Probabilistic Horn Abduction Poole 93
Distributional Semantics (PRISM) Sato 95
Stochastic Logic Programs Muggleton 96 Cussens
99
Relational Bayesian Nets Jaeger 97
Probabilistic Logic Programs Ngo, Haddawy 97
Object-Oriented Bayesian Nets Koller, Pfeffer
97
Probabilistic Frame-Based Systems Koller,
Pfeffer 98
Probabilistic Relational Models Koller 99

40
Outline

Bayesian Logic Programs
Examples and Language
Semantics and Support Networks
Learning Bayesian Logic Programs
Data Cases
Parameter Estimation
Structural Learning

E
41
Learning Bayesian Logic Programs
D
Data Background Knowledge
42
Why Learning Bayesian Logic Programs ?
D
Inductive Logic Programming
Learning within Bayesian network
Learning within Bayesian Logic Programs

Of interest to different communities ?
scoring functions, pruning techniques,
theoretical insights, ...

43
What is the data about ?
E
44
Learning Task
E

Given
set of data cases
a Bayesian logic program B
Goal for each the parameters
of that best fit the given data

45
Parameter Estimation (contd.)
E

best fit ML-Estimation
where the hypothesis space is spanned by
the product space over the possible values of

46
Parameter Estimation (contd.)
E

Assumption
D1,...,DN are independently sampled from
indentical distributions (e.g. totally separated
families),),

47
Parameter Estimation (contd.)
E
48
Parameter Estimation (contd.)
E

Reduced to a problem within Bayesian networks
given structure,
partially observed random varianbles
EM
Dempster, Laird, Rubin, 77,
Lauritzen, 91
Gradient Ascent
Binder, Koller, Russel, Kanazawa,
97, Jensen, 99

49
Decomposable CRs
E

Parameters of the clauses and not of the support
network.

Single ground instance of a Bayesian clause
Multiple ground instance of the same Bayesian cla
use
CPD for Combining Rule
50
Gradient Ascent
E
51
Gradient Ascent
E
52
Gradient Ascent
E
53
Algorithm
E
54
Expectation-Maximization
E

Initialize parameters
E-Step and M-Step, i.e.
compute expected counts for each clause and
treat the expected count as counts
If not converged, iterate to 2

55
Experimental Evidence
E

Koller, Pfeffer 97
support network is a good
approximation
Binder et al. 97
equality constraints speed up
learning

100 data cases
constant step-size
Estimation of means
13 iterations
Estimation of the weights
sum 1.0

56
Outline

Bayesian Logic Programs
Examples and Language
Semantics and Support Networks
Learning Bayesian Logic Programs
Data Cases
Parameter Estimation
Structural Learning

E
57
Structural Learning
E

Combination of Inductive Logic Programming and
Bayesian network learning

Datalog fragment of
Bayesian logic programs (no functors)
intensional Bayesian clauses

58
Idea - CLAUDIEN
E

learning from interpretations
all data cases are
Herbrand interpretations
a hypothesis should
reflect what is in the
data

59
What is the data about ?
E
...
60
Claudien -Learning From Interpretations
E

set of data cases
set of all clauses that can be part of
hypotheses
(logically) valid iff
logical solution iff is a logically
maximally general valid hypothesis

61
Learning Task
E

Given
set of data cases
a set of Bayesian logic programs
a scoring function
Goal probabilistic solution
matches the data best according to

62
Algorithm
E
63
Example
E
64
Example
E
65
Example
E
66
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
67
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
68
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
69
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
...
70
Properties
E

All relevant random variables are known
First order equivalent of Bayesian network
setting
Hypothesis postulates true regularities in the
data
Logical solutions as inital hypotheses
Highlights Background Knowledge

71
Example Experiments
E
mc(X) m(M,X), mc(M), pc(M). pc(X) f(F,X), mc(
F), pc(F).
bt(X) mc(X), pc(X).
Data sampling from 2 families, each 1000
samples Score LogLikelihood Goal learn the
definition of bt
72
Conclusion