Bayesian Logic Programs - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Bayesian Logic Programs

Description:

Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland ... and 18 August 2002, Helsinki, Finland. Summer School on Relational Data Mining, ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 73
Provided by: csU2
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Logic Programs


1
Bayesian Logic Programs
Summer School on Relational Data Mining
17 and 18 August 2002, Helsinki, Finland
  • Kristian Kersting, Luc De Raedt
  • Albert-Ludwigs University
  • Freiburg, Germany

2
Context
Real-world applications
3
Outline
  • Bayesian Logic Programs
  • Examples and Language
  • Semantics and Support Networks
  • Learning Bayesian Logic Programs
  • Data Cases
  • Parameter Estimation
  • Structural Learning

4
Bayesian Logic Programs
E
  • Probabilistic models structured using logic
  • Extend Bayesian networks with notions of objects
    and relations
  • Probability density over (countably) infinitely
    many random variables
  • Flexible discrete-time stochastic processes
  • Generalize pure Prolog, Bayesian networks,
    dynamic Bayesian networks, dynamic Bayesian
    multinets, hidden Markov models,...

5
Bayesian Networks
E
  • One of the successes of AI
  • State-of-the-art to model uncertainty, in
    particular the degree of belief
  • Advantage Russell, Norvig 96
  • strict separation of qualitative and
    quantitative aspects of the world
  • Disadvantge Breese, Ngo, Haddawy, Koller, ...
  • Propositional character, no notion of objects and
    relations among them

6
Stud farm (Jensen 96)
E
  • The colt John has been born recently on a stud
    farm.
  • John suffers from a life threatening hereditary
    carried by a recessive gene. The disease is so
    serious that John is displaced instantly, and the
    stud farm wants the gene out of production, his
    parents are taken out of breeding.
  • What are the probabilities for the remaining
    horses to be carriers of the unwanted gene?

7
Bayesian networks Pearl 88
E
Based on the stud farm example Jensen 96
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
8
Bayesian networks Pearl 88
E
Based on the stud farm example Jensen 96
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
(Conditional) Probability distribution
bt_john
P(bt_cecilyaAbt_johnaA)0.1499
P(bt_johnAAbt_annaA)0.6906
P(bt_johnAA)0.9909

9
Bayesian networks (contd.)
E
  • acyclic graphs
  • probability distribution over a finite set
    of random
    variables

10
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
11
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
12
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry
bt_irene
bt_john
13
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry bt_fred, bt_dorothy.
bt_irene bt_eric, bt_gwenn.
bt_john
14
From Bayesian Networks to Bayesian Logic Programs
E
bt_ann.
bt_brian.
bt_cecily.
bt_unknown2.
bt_unknown1.
bt_dorothy bt_ann, bt_brian.
bt_eric bt_brian, bt_cecily.
bt_gwenn bt_ann, bt_unknown2.
bt_fred bt_unknown1, bt_ann.
bt_henry bt_fred, bt_dorothy.
bt_irene bt_eric, bt_gwenn.
bt_john bt_henry ,bt_irene.
15
From Bayesian Networks to Bayesian Logic Programs
E
apriori nodes bt_ann. bt_brian. bt_c
ecily. bt_unknown1. bt_unknown1.
aposteriori nodes bt_henry bt_fred, bt
_dorothy. bt_irene bt_eric, bt_gwenn. bt_
fred bt_unknown1, bt_ann. bt_dorothy bt_br
ian, bt_ann. bt_eric bt_brian, bt_cecily.
bt_gwenn bt_unknown2, bt_ann.
bt_john bt_henry, bt_irene.
16
From Bayesian Networks to Bayesian Logic Programs
E
apriori nodes bt(ann). bt(brian). bt
(cecily). bt(unknown1). bt(unknown1).
aposteriori nodes bt(henry) bt(fred),
bt(dorothy). bt(irene) bt(eric), bt(gwenn).
bt(fred) bt(unknown1), bt(ann).
bt(dorothy) bt(brian), bt(ann).
bt(eric) bt(brian), bt(cecily).
bt(gwenn) bt(unknown2), bt(ann).
bt(john) bt(henry), bt(irene).
17
From Bayesian Networks to Bayesian Logic Programs
E
ground facts / apriori bt(ann). bt(brian
). bt(cecily). bt(unkown1). bt(unkown1).
father(unkown1,fred). mother(ann,fred).
father(brian,dorothy). mother(ann, dorothy).
father(brian,eric). mother(cecily,eric).
father(unkown2,gwenn). mother(ann,gwenn).
father(fred,henry). mother(dorothy,henry).
father(eric,irene). mother(gwenn,irene).
father(henry,john). mother(irene,john).
rules / aposteriori bt(X) father(F,
X), bt(F), mother(M,X), bt(M).
18
Dependency graph Bayesian network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
19
Dependency graph Bayesian network
E
bt_ann
bt_brian
bt_cecily
bt_unknown2
bt_unknown1
bt_dorothy
bt_eric
bt_gwenn
bt_fred
bt_henry
bt_irene
bt_john
20
Bayesian Logic Programs- a first definition
E
  • A BLP consists of
  • a finite set of Bayesian clauses.
  • To each clause in a conditional
    probability distribution is
    associated
  • Proper random variables LH(B)
  • graphical structure dependency graph
  • Quantitative information CPDs

21
Bayesian Logic Programs- Examples
E
pure Prolog
apriori nodes nat(0). aposteriori nodes
nat(s(X)) nat(X).
MC
...
apriori nodes state(0). aposteriori nodes
state(s(Time)) state(Time). output(Time)
state(Time)
HMM
apriori nodes n1(0). aposteriori nodes
n1(s(TimeSlice) n2(TimeSlice). n2(TimeSlice)
n1(TimeSlice). n3(TimeSlice) n1(TimeSlice)
, n2(TimeSlice).
DBN
22
Associated CPDs
E
  • represent generically the CPD for each ground
    instance of the corresponding Bayesian clause.

23
Combining Rules
E
  • Multiple ground instances of clauses having the
    same head atom?

ground facts as before rules bt(X) f
ather(F,X), bt(F).
bt(X) mother(M,X), bt(M).
24
Combining Rules (contd.)
E
  • Any algorithm which
  • combines a set of PDFs
  • into the (combined) PDFs
  • where
  • has an empty output if and only if the input is
    empty
  • E.g. noisy-or, regression, ...

P(AB) and P(AC)
CR
P(AB,C)
25
Bayesian Logic Programs- a definition
E
  • A BLP consists of
  • a finite set of Bayesian clauses.
  • To each clause in a conditional
    probability distribution is
    associated
  • To each Bayesian predicate p a combining rule
    is associated to combine CPDs of multiple
    ground instances of clauses having the same head
  • Proper random variables LH(B)
  • graphical structure dependency graph
  • Quantitative information CPDs and CRs

26
Outline
  • Bayesian Logic Programs
  • Examples and Language
  • Semantics and Support Networks
  • Learning Bayesian Logic Programs
  • Data Cases
  • Parameter Estimation
  • Structural Learning

E
27
Discrete-Time Stochastic Process
E
  • Family of random variables
    over a domain X, where
  • for each linearization of the partial order
    induced by the dependency graph a Bayesian logic
    program specifies a discrete-time stochastic
    process

28
Theorem of Kolmogorov
E
29
Consistency Conditions
E
  • Probability measure ,
  • is represented by a finite Bayesian network
    which is a subnetwork of the dependency graph
    over LH(B) Support Network
  • (Elimination Order) All stochastic processes
    represented by a Bayesian logic program B specify
    the same probability measure over LH(B).

30
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
31
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
32
Support network
E
bt(unknown2)
bt(cecily)
father(brian,dorothy)
bt(brian)
father(brian,eric)
bt(ann)
mother(ann,dorothy)
mother(ann,gwenn)
mother(cecily,eric)
bt(eric)
bt(dorothy)
bt(gwenn)
bt(unknown1)
father(eric,irene)
bt(fred)
bt(irene)
mother(dorothy,henry)
father(unknown2,eric)
father(unknown1,fred)
bt(henry)
bt(john)
mother(gwenn,irene)
mother(ann,fred)
father(henry,john)
father(fred,henry)
mother(irene,john)
33
Support network
E
  • Support network of is
    the induced subnetwork of
  • Support network of is
    defined as
  • Computation utilizes And/Or trees

34
Queries using And/Or trees
E
  • A probabilistic query
  • ?- Q1...,QnE1e1,...,Emem.
  • asks for the distribution
  • P(Q1, ..., Qn E1e1, ..., Emem).
  • ?- bt(eric).

35
Consistency Condition (contd.)
E
  • the dependency graph is acyclic, and
  • every random variable is influenced by a finite
    set of random variables only

36
Relational Character
E
ground facts bt(ann). bt(brian). bt(ce
cily). bt(unknown1). bt(unknown1). father
(unknown1,fred). mother(ann,fred).
father(brian,dorothy). mother(ann, dorothy).
father(brian,eric). mother(cecily,eric).
father(unknown2,gwenn). mother(ann,gwenn).
father(fred,henry). mother(dorothy,henry).
father(eric,irene). mother(gwenn,irene).
father(henry,john). mother(irene,john).
rules bt(X) father(F,X), bt(F), mother(M,X
), bt(M).
37
Bayesian Logic Programs- Summary
E
  • First order logic extension of Bayesian networks
  • constants, relations, functors
  • discrete and continuous random variables
  • ground atoms random variables
  • CPDs associated to clauses
  • Dependency graph
  • (possibly) infinite
    Bayesian network
  • Generalize dynamic Bayesian networks and definite
    clause logic (range-restricted)

38
Applications
  • Probabilistic, logical
  • Description and prediction
  • Regression
  • Classification
  • Clustering
  • Computational Biology
  • APrIL IST-2001-33053
  • Web Mining
  • Query approximation
  • Planning, ...

39
Other frameworks
  • Probabilistic Horn Abduction Poole 93
  • Distributional Semantics (PRISM) Sato 95
  • Stochastic Logic Programs Muggleton 96 Cussens
    99
  • Relational Bayesian Nets Jaeger 97
  • Probabilistic Logic Programs Ngo, Haddawy 97
  • Object-Oriented Bayesian Nets Koller, Pfeffer
    97
  • Probabilistic Frame-Based Systems Koller,
    Pfeffer 98
  • Probabilistic Relational Models Koller 99

40
Outline
  • Bayesian Logic Programs
  • Examples and Language
  • Semantics and Support Networks
  • Learning Bayesian Logic Programs
  • Data Cases
  • Parameter Estimation
  • Structural Learning

E
41
Learning Bayesian Logic Programs
D
Data Background Knowledge
42
Why Learning Bayesian Logic Programs ?
D
Inductive Logic Programming
Learning within Bayesian network
Learning within Bayesian Logic Programs
  • Of interest to different communities ?
  • scoring functions, pruning techniques,
    theoretical insights, ...

43
What is the data about ?
E
44
Learning Task
E
  • Given
  • set of data cases
  • a Bayesian logic program B
  • Goal for each the parameters
  • of that best fit the given data

45
Parameter Estimation (contd.)
E
  • best fit ML-Estimation
  • where the hypothesis space is spanned by
    the product space over the possible values of

46
Parameter Estimation (contd.)
E
  • Assumption
  • D1,...,DN are independently sampled from
    indentical distributions (e.g. totally separated
    families),),

47
Parameter Estimation (contd.)
E
48
Parameter Estimation (contd.)
E
  • Reduced to a problem within Bayesian networks
  • given structure,
  • partially observed random varianbles
  • EM
  • Dempster, Laird, Rubin, 77,
    Lauritzen, 91
  • Gradient Ascent
  • Binder, Koller, Russel, Kanazawa,
    97, Jensen, 99

49
Decomposable CRs
E
  • Parameters of the clauses and not of the support
    network.

Single ground instance of a Bayesian clause
Multiple ground instance of the same Bayesian cla
use
CPD for Combining Rule
50
Gradient Ascent
E
51
Gradient Ascent
E
52
Gradient Ascent
E
53
Algorithm
E
54
Expectation-Maximization
E
  • Initialize parameters
  • E-Step and M-Step, i.e.
  • compute expected counts for each clause and
    treat the expected count as counts
  • If not converged, iterate to 2

55
Experimental Evidence
E
  • Koller, Pfeffer 97
  • support network is a good
    approximation
  • Binder et al. 97
  • equality constraints speed up
    learning
  • 100 data cases
  • constant step-size
  • Estimation of means
  • 13 iterations
  • Estimation of the weights
  • sum 1.0

56
Outline
  • Bayesian Logic Programs
  • Examples and Language
  • Semantics and Support Networks
  • Learning Bayesian Logic Programs
  • Data Cases
  • Parameter Estimation
  • Structural Learning

E
57
Structural Learning
E
  • Combination of Inductive Logic Programming and
    Bayesian network learning
  • Datalog fragment of
  • Bayesian logic programs (no functors)
  • intensional Bayesian clauses

58
Idea - CLAUDIEN
E
  • learning from interpretations
  • all data cases are
  • Herbrand interpretations
  • a hypothesis should
  • reflect what is in the
  • data

59
What is the data about ?
E
...
60
Claudien -Learning From Interpretations
E
  • set of data cases
  • set of all clauses that can be part of
  • hypotheses
  • (logically) valid iff
  • logical solution iff is a logically
    maximally general valid hypothesis

61
Learning Task
E
  • Given
  • set of data cases
  • a set of Bayesian logic programs
  • a scoring function
  • Goal probabilistic solution
  • matches the data best according to

62
Algorithm
E
63
Example
E
64
Example
E
65
Example
E
66
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
67
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
68
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
69
Example
E
mc(ann)
mc(eric)
pc(ann)
pc(eric)
mc(john)
pc(john)
m(ann,john)
f(eric,john)
bc(john)
...
70
Properties
E
  • All relevant random variables are known
  • First order equivalent of Bayesian network
    setting
  • Hypothesis postulates true regularities in the
    data
  • Logical solutions as inital hypotheses
  • Highlights Background Knowledge

71
Example Experiments
E
mc(X) m(M,X), mc(M), pc(M). pc(X) f(F,X), mc(
F), pc(F).
bt(X) mc(X), pc(X).
Data sampling from 2 families, each 1000
samples Score LogLikelihood Goal learn the
definition of bt
72
Conclusion
  • EM-based and Gradient-based method to do ML
    parameter estimation
  • Link between ILP and learning Bayesian networks
  • CLAUDIEN setting used to define and to traverse
    the search space
  • Bayesian network scores used to evaluate
    hypotheses

73
E
Thanks !
Write a Comment
User Comments (0)
About PowerShow.com