Representation, Inference and Learning in Relational Probabilistic Languages presentation

About This Presentation

Transcript and Presenter's Notes

Title: Representation, Inference and Learning in Relational Probabilistic Languages

1
Representation, Inference and Learning in
Relational Probabilistic Languages

Lise Getoor
University of Maryland
College Park

Avi Pfeffer Harvard University
IJCAI 2005 Tutorial
2
Introduction

Probability is good
First-order representations are good
Variety of approaches that combine them
We wont cover all of them in detail
apologies if we leave out your favorite
We will cover three broad classes of approaches,
and present exemplars of each approach
We will highlight common issues, themes, and
techniques that recur in different approaches

3
Running Example

There are papers, researchers, citations,
reviews
Papers have a quality, and may or may not be
accepted
Authors may be smart and good writers
Papers have topics, and cite other papers which
may or may not be on the same topic
Papers are reviewed by reviewers, who have moods
that are influenced by the quality of the writing

4
Some Queries

What is the probability that a researcher is
famous, given that one of her papers was accepted
despite the fact that a reviewer was in a bad
mood?
What is the probability that a paper is accepted,
given that another paper by the same author is
accepted?
What is the probability that a paper is an AI
paper, given that it is cited by an AI paper?
What is the probability that a student of a
famous advisor has seven high quality papers?

5
Sample Domains

Web Pages and Link Analysis
Battlespace Awareness
Epidemiological Studies
Citation Networks
Communication Networks (Cellphone Fraud
Detection)
Intelligence Analysis (Terrorist Networks)
Financial Transactions (Money Laundering)
Computational Biology
Object Recognition and Scene Analysis
Natural Language Processing (e.g. Information
Extraction and Semantic Parsing)

6
Roadmap

Motivation
Background Bayesian network inference and
learning
Rule-based Approaches
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

7
Bayesian Networks Pearl 87
Smart
Good Writer
Reviewer Mood
Quality
nodes domain variables edges direct causal
influence
Accepted
Review Length
Network structure encodes conditional
independencies I(Review-Length ,
Good-Writer Reviewer-Mood)
8
BN Semantics
conditional independencies in BN structure
local CPTs
full joint distribution over domain

Compact natural representation
nodes have ? k parents ?? O(2k n) vs. O(2n)
params
natural parameters

9
Reasoning in BNs

Full joint distribution specifies answer to any
query P(event evidence)
Allows combination of different types of
reasoning
Causal P(Reviewer-Mood Good-Writer)
Evidential P(Reviewer-Mood not Accepted)
Intercausal P(Reviewer-Mood not Accepted,
Quality)

10
Variable Elimination Zhang Poole 96, Dechter
98

To compute

factors
A factor is a function from values of variables
to positive real numbers
11
Variable Elimination

To compute

12
Variable Elimination

To compute

sum out l
13
Variable Elimination

To compute

new factor
14
Variable Elimination

To compute

multiply factors together then sum out w
15
Variable Elimination

To compute

new factor
16
Variable Elimination

To compute

17
Some Other Inference Algorithms

Exact
Junction Tree Lauritzen Spiegelhalter 88
Cutset Conditioning Pearl 87
Approximate
Loopy Belief Propagation McEliece et al 98
Likelihood Weighting Shwe Cooper 91
Markov Chain Monte Carlo eg MacKay 98
Gibbs Sampling Geman Geman 84
Metropolis-Hastings Metropolis et al 53,
Hastings 70
Variational Methods Jordan et al 98
etc.

18
Learning in BNs
Parameters only
Structure and Parameters
EM Dempster et al 77 or gradient descent
Russell et al 95
Complete Data
Easy counting
Incomplete Data
Structure search
Structural EM Friedman 97
See Heckerman 98 for a general introduction
19
Parameter Estimation in BNs

Assume known dependency structure G
Goal estimate BN parameters q
entries in local probability models,
q is good if its likely to generate observed
data.
MLE Principle Choose q so as to maximize l
Alternative incorporate a prior

20
Learning With Complete Data

Fully observed data data consists of set of
instances, each with a value for all BN variables
With fully observed data, we can compute
number of instances with , and
and similarly for other counts
We then estimate

21
Learning with Missing Data Expectation-Maximizati
on (EM)

Cant compute
But
Given parameter values, can compute expected
counts
Given expected counts, estimate parameters
Begin with arbitrary parameter values
Iterate these two steps
Converges to local maximum of likelihood

this requires BN inference
22
Structure search

Begin with an empty network
Consider all neighbors reached by a search
operator that are acyclic
add an edge
remove an edge
reverse an edge
For each neighbor
compute ML parameter values
compute score(s)
Choose the neighbor with the highest score
Continue until we reach a local maximum

23
Limitations of BNs

Inability to generalize across collection of
individuals within a domain
if you want to talk about multiple individuals in
a domain, you have to talk about each one
explicitly, with its own local probability model
Domains have fixed structure e.g. one author,
one paper and one reviewer
if you want to talk about domains with multiple
inter-related individuals, you have to create a
special purpose network for the domain
For learning, all instances have to have the same
set of entities

24
First Order Approaches

Advantages of first order probabilistic models
represent world in terms of individuals and
relationships between them
ability to generalize about many instances in
same domain
allow compact parameterization
support reasoning about general classes of
individuals rather than the individuals
themselves
allow representation of high level structure, in
which objects interact weakly with each other

25
Three Different Approaches

Rule-based approaches focus on facts
what is true in the world?
what facts do other facts depend on?
Frame-based approaches focus on objects and
relationships
what types of objects are there, and how are they
related to each other?
how does a property of an object depend on other
properties (of the same or other objects)?
Programming language approaches focus on
processes
how is the world generated?
how does one event influence another event?

26
Roadmap

Motivation
Background
Rule-based Approaches
Basic Approach
Knowledge-Based Model Construction
Issues
First-Order Variable Elimination
Learning
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

27
Flavors

Goldman Charniak 93
Breese 92
Probabilistic Horn Abduction Poole 93
Probabilistic Logic Programming Ngo Haddawy
96
Relational Bayesian Networks Jaeger 97
Bayesian Logic Programs Kersting de Raedt 00
Stochastic Logic Programs Muggleton 96
PRISM Sato Kameya 97
CLP(BN) Costa et al. 03
etc.

28
Intuitive Approach

In logic programming,
accepted(P) - author(P,A), famous(A).
means
For all P,A if A is the author of P and A is
famous, then P is accepted
This is a categorical inference
But this will not be true in many cases

29
Fudge Factors

Use
accepted(P) - author(P,A), famous(A). (0.6)
This means
For all P,A if A is the author of P and A is
famous, then P is accepted with probability 0.6
But what does this mean when there are other
possible causes of a paper being accepted?
e.g. accepted(P) - high_quality(P). (0.8)

30
Intuitive Meaning

accepted(P) - author(P,A), famous(A). (0.6)
means
For all P,A if A is the author of P and A is
famous, then P is accepted with probability 0.6,
provided no other possible cause of the paper
being accepted holds
If more than one possible cause holds, a
combining rule is needed to combine the
probabilities

31
Meaning of Disjunction

In logic programming
accepted(P) - author(P,A), famous(A).
accepted(P) - high_quality(P).
means
For all P,A if A is the author of P and A is
famous, or if P is high quality, then P is
accepted

32
Intuitive Meaning of Probabilistic Disjunction

For us
accepted(P) - author(P,A), famous(A). (0.6)
accepted(P) - high_quality(P). (0.8)
means
For all P,A, if (A is the author of P and A is
famous successfully cause P to be accepted) or (P
is high quality successfully causes P to be
accepted), then P is accepted.
If A is the author of P and A is famous, they
successfully cause P to be accepted with
probability 0.6.
If P is high quality, it successfully causes P to
be accepted with probability 0.8.

33
Noisy-Or

Multiple possible causes of an effect
Each cause, if it is true, successfully causes
the effect with a given probability
Effect is true if any of the possible causes is
true and successfully causes it
All causes act independently to produce the
effect (causal independence)
Note accepted(P) - author(P,A), famous(A).
(0.6) may produce multiple possible causes for
different values of A
Leak probability effect may happen with no cause
e.g. accepted(P). (0.1)

34
Noisy-Or
author(p1,alice)
author(p1,bob)
high_quality(p1)
famous(alice)
famous(bob)
0.6
0.6
0.8
accepted(p1)
35
Computing Noisy-Or Probabilities

What is P(accepted(p1)) given that Alice is an
author and Alice is famous, and that the paper is
high quality, but no other possible cause is true?

leak
36
Combination Rules

Other combination rules are possible
E.g. max
In our case,
P(accepted(p1)) max 0.6,0.8,0.1 0.8
Harder to interpret in terms of logic program

37
Roadmap

Motivation
Background
Rule-based Approaches
Basic Approach
Knowledge-Based Model Construction
Issues
First-Order Variable Elimination
Learning
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

38
Knowledge-Based Model Construction (KBMC)

Construct a Bayesian network, given a query Q and
evidence E
query and evidence are sets of ground atoms,
i.e., predicates with no variable symbols
e.g. author(p1,alice)
Construct network by searching for possible
proofs of the query and the variables
Use standard BN inference techniques on
constructed network

39
KBMC Example

smart(alice). (0.8)
smart(bob). (0.9)
author(p1,alice). (0.7)
author(p1,bob). (0.3)
high_quality(P) - author(P,A), smart(A). (0.5)
high_quality(P). (0.1)
accepted(P) - high_quality(P). (0.9)
Query is accepted(p1).
Evidence is smart(bob).

40
Backward Chaining

Start with evidence variable smart(bob)

smart(bob)
41
Backward Chaining

Rule for smart(bob) has no antecedents stop
backward chaining

smart(bob)
42
Backward Chaining

Begin with query variable accepted(p1)

smart(bob)
accepted(p1)
43
Backward Chaining

Rule for accepted(p1) has antecedent
high_quality(p1) add high_quality(p1) to
network, and make parent of accepted(p1)

smart(bob)
high_quality(p1)
accepted(p1)
44
Backward Chaining

All of accepted(p1)s parents have been found
create its conditional probability table (CPT)

smart(bob)
high_quality(p1)
high_quality(p1)
accepted(p1)
hq
0.7
0.3
accepted(p1)
hq
0
1
45
Backward Chaining

high_quality(p1) - author(p1,A), smart(A) has
two groundings Aalice and Abob

smart(bob)
high_quality(p1)
accepted(p1)
46
Backward Chaining

For grounding Aalice, add author(p1,alice) and
smart(alice) to network, and make parents of
high_quality(p1)

smart(bob)
smart(alice)
author(p1,alice)
high_quality(p1)
accepted(p1)
47
Backward Chaining

For grounding Abob, add author(p1,bob) to
network. smart(bob) is already in network. Make
both parents of high_quality(p1)

smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
48
Backward Chaining

Create CPT for high_quality(p1) make noisy-or,
and dont forget leak probability

smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
49
Backward Chaining

author(p1,alice), smart(alice) and author(p1,bob)
have no antecedents stop backward chaining

smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
50
Backward Chaining

assert evidence smart(bob) true, and compute
P(accepted(p1) smart(bob) true)

true
smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
51
Backward Chaining on Both Query and Evidence

Necessary, if query and evidence have common
ancestor
Sufficient. P(Query Evidence) can be computed
using only ancestors of query and evidence nodes
unobserved descendants are irrelevant

Ancestor
Query
Evidence
52
Roadmap

Motivation
Background
Rule-based Approaches
Basic Approach
Knowledge-Based Model Construction
Issues
First-Order Variable Elimination
Learning
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

53
The Role of Context

Context is deterministic knowledge known prior to
the network being constructed
May be defined by its own logic program
Is not a random variable in the BN
Used to determine the structure of the
constructed BN
If a context predicate P appears in the body of a
rule R, only backward chain on R if P is true

54
Context example

Suppose author(P,A) is a context predicate,
author(p1,bob) is true, and author(p1,alice)
cannot be proven from deterministic KB (and is
therefore false by assumption)
Network is

No author(p1,bob) node because it is a context
predicate
smart(bob)
No smart(alice) node because author(p1,alice) is
false
high_quality(p1)
accepted(p1)
55
Basic Assumptions

No cycles in resulting BN
If there are cycles, cannot interpret BN as
definition of joint probability distribution
Model construction process terminates
in particular, no function symbols. Consider
famous(X) - famous(advisor(X)).
this creates an infinite backwards chain

famous(advisor(advisor(X)))
famous(advisor(X))
famous(X)
56
Resolving Cycles Glesner
Koller 95

Deal with cycles by introducing time
E.g.
famous(X) - coauthor(X,Y), famous(Y).
becomes
famous(X,T) - coauthor(X,Y,T-1), famous(Y,T-1).
Defines a dynamic Bayesian network

57
Resolving Infinite Networks Pfeffer Koller 00

How to deal with famous(X) - famous(advisor(X)).
KBMC will construct an infinite network cant
do
To compute P(famous(alice))
Expand network backwards n generations
Assume uniform distribution at roots
Compute P n(famous(alice))
The series P 1(famous(alice)), P
2(famous(alice)), will converge as n??
The limit is defined to be P(famous(alice))

58
Reusing Inference

Iteration 1 compute P 1(famous(X))
Iteration 2 compute P 2(famous(X))
requires P 1(famous(advisor(X))) P 1(famous(X))
already computed reuse result
Iteration 3 compute P 3(famous(X))
requires P 2(famous(advisor(X))) P 2(famous(X))
already computed reuse result
Constant amount of work per iteration

59
Roadmap

Motivation
Background
Rule-based Approaches
Basic Approach
Knowledge-Based Model Construction
Issues
First-Order Variable Elimination
Learning
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

60
Semantics

Assuming BN construction process terminates,
conditional probability of any query given any
evidence is defined by the BN.
Somewhat unsatisfying because
meaning of program is query dependent (depends
on constructed BN)
meaning is not stated declaratively in terms of
program but in terms of constructed network
instead

61
Disadvantage of Intuitive Approach

Up until now, ground logical atoms have been
random variables ranging over T,F
cumbersome to have a different random variable
for lead_author(p1,alice), lead_author(p1,bob)
and all possible values of lead_author(p1,A)
worse, since lead_author(p1,alice) and
lead_author(p1,bob) are different random
variables, it is possible for both to be true at
the same time

62
Bayesian Logic Programs Kersting and de Raedt
00, following Ngo Haddawy 95

Now, ground atoms are random variables with any
range (not necessarily Boolean)
now quality is a random variable, with values
high, medium, low
Any probabilistic relationship is allowed
expressed in CPT
Semantics of program given once and for all
not query dependent

63
Meaning of Rules in BLPs

accepted(P) - quality(P).
means
For all P, if quality(P) is a random variable,
then accepted(P) is a random variable
Associated with this rule is a conditional
probability table (CPT) that specifies the
probability distribution over accepted(P) for any
possible value of quality(P)

64
Combining Rules for BLPs

accepted(P) - quality(P).
accepted(P) - author(P,A), fame(A).
Before, combining rules combined individual
probabilities with each other
noisy-or and max rules made sense
Now, combining rules combine entire CPTs
can still define noisy-or and max
harder to interpret

65
Semantics of BLPs

Random variables are all ground atoms that have
finite proofs in logic programs
assumes acyclicity
assumes no function symbols
Can construct BN over all random variables
parents derived from rules
CPTs derived using combining rules
Semantics of BLP joint probability distribution
over all random variables
does not depend on query
Inference in BLP by KBMC

66
An Issue

How to specify uncertainty over single-valued
relations?
Approach 1 make lead_author(P) a random variable
taking values bob, alice etc.
we cant say accepted(P) - lead_author(P),
famous(A) because A does not appear in the rule
head or in a previous term in the body
Approach 2 make lead_author(P,A) a random
variable with values true, false
we run into the same problems as with the
intuitive approach (may have zero or many lead
authors)
Approach 3 make lead_author a function
say accepted(P) - famous(lead_author(P))
BLPs need to specify how to deal with function
symbols and uncertainty over them

67
First-Order Variable Elimination Poole 03, see
also Braz et al 05

Generalization of variable elimination to first
order domains
Reasons directly about first-order variables,
instead of at the ground level
Assumes that the size of the population for each
type of entity is known

68
FOVE Example

famous(XPerson) - coauthor(X,Y). (0.2)
coauthor(XPerson,YPerson) - knows(X,Y). (0.3)
knows(XPerson,YPerson). (0.01)
Person 1000
Evidence knows(alice,bob)
Query famous(alice)

69
What KBMC Will Produce
knows(a,b)
knows(a,c)
knows(a,d)
1000 times
coauthor(a,b)
coauthor(a,c)
coauthor(a,d)
famous(alice)
70
Better Idea

Instead of grounding out all variables, reason
about some of them at the lifted level
Eliminate entire relations at a time, instead of
individual ground terms
Use parameterized variables, e.g. reason directly
about coauthor(X,Y)
Use the known population size to quantify over
populations

71
Parameterized Factors or Parfactors

Functions from parameterized variables to
positive real numbers cf. factors in VE
Plus constraints on parameters

X alice
knows(X,Y)
coauthor(X,Y)
f
f
1
f
t
0
t
f
0.7
t
t
0.3
72
Splitting
knows(X,Y)
coauthor(X,Y)
f
f
1
Split
produces
on
Y bob
f
t
0
t
f
0.7
t
t
0.3
Y ? bob
Y bob
knows(X,Y)
coauthor(X,Y)
knows(X,Y)
coauthor(X,Y)
f
f
1
f
f
1
f
t
0
f
t
0
t
f
0.7
t
f
0.7
t
t
0.3
t
t
0.3
residual
73
Conditioning on Evidence
Condition
produces
on
knows(alice,bob)
X ? alice or Y ? bob
X alice Y bob
coauthor(X,Y)
knows(X,Y)
coauthor(X,Y)
f
f
1
f
0.7
f
t
0
t
0.3
t
f
0.7
t
t
0.3
In reality, constraints are conjunctive. Three
parfactors X alice Y bob, X ? alice
and X alice Y ? bob will be produced
74
Eliminating knows(X,Y)
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
1
Multiply
by
produces
f
t
0
t
f
0.7
t
t
0.3
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
0.99
f
t
0
t
f
0.007
t
t
0.003
75
Eliminating knows(X,Y)
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
0.99
Summing out knows(X,Y) in
produces
f
t
0
t
f
0.007
t
t
0.003
X ? alice or Y ? bob
coauthor(X,Y)
f
0.997
t
0.003
76
Eliminating coauthor(X,Y) Multiplying Multiple
Parfactors

Use unification to decide which factors to
multiply,
and what their constraints will be

X alice
famous(X)
coauthor(X,Y)
f
f
1
f
t
0.8
t
f
0
t
t
0.2
X ? alice or Y ? bob
X alice Y bob
coauthor(X,Y)
coauthor(X,Y)
f
0.7
f
0.997
t
0.3
t
0.003
77
Multiplying Multiple Parfactors

Multiply each pair of factors that unify, to
produce

X alice Y ? bob
X alice Y bob
famous(X)
coauthor(X,Y)
famous(X)
coauthor(X,Y)
f
f
0.7
f
f
0.997
f
t
0.24
f
t
0.0024
t
f
0
t
f
0
t
t
0.06
t
t
0.0006
78
Aggregating Over Populations
X alice Y ? bob
famous(X)
coauthor(X,Y)
f
f
0.997
The parfactor
represents a
f
t
0.0024
t
f
0
t
t
0.0006
ground factor for each person in the population
other than bob. These factors combine via noisy
or.
population size - 1
from X alice Y bob parfactor
79
Detail Determining Variables in Product
k(X2,Y2)
f(X2,Y2)
k(X1,Y1)
f
f
1
Multiplying
by
produces
f
0.99
f
t
0
t
0.01
t
f
0.7
t
t
0.3
X1?X2 or Y1 ?Y2
k(X1,Y1)
f(X2,Y2)
k(X2,Y2)
k(X2,Y2)
f(X2,Y2)
f
f
0.99
f
f
f
0.99
f
t
0
f
f
t
0
f
f
0.693
t
t
f
0.007
and
f
t
0.297
t
t
t
0.003
t
f
0.01
f
t
f
0
t
for the case where X1X2 and Y1Y2
t
t
0.007
f
t
t
0.003
t
80
Other details

When multiplying two parfactors, compute their
most general unifier (mgu)
Split the parfactors on the mgu
Keep the residuals
Multiply the non-residuals together
See Poole 03 for more details
See Braz, Amir and Roth 05 at IJCAI!

81
Roadmap

Motivation
Background
Rule-based Approaches
Basic Approach
Knowledge-Based Model Construction
Issues
First-Order Variable Elimination
Learning
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

82
Learning Rule Parameters Koller Pfeffer 97,
Sato Kameya 01

Problem definition
Given a skeleton rule base consisting of rules
without uncertainty parameters
and a set of instances, each with
a set of context predicates
observations about some random variables
Goal learn parameter values for the rules that
maximize the likelihood of the data

83
Basic Approach

Construct a network BNi for each instance i using
KBMC, backward chaining on all the observed
variables
Expectation Maximization (EM)
exploit parameter sharing

84
Parameter Sharing

In BNs, all random variables have distinct CPTs
only share parameters between different
instances, not different random variables
In logical approaches, an instance may contain
many objects of the same kind
multiple papers, multiple authors, multiple
citations
Parameters are shared within instances
same parameters used across different papers,
authors, citations
Parameter sharing allows faster learning, and
learning from a single instance

85
Rule Parameters and CPT Entries

In principle, combining rules produce complicated
relationship between model parameters and CPT
entries
With a decomposable combining rule, each node is
derived from a single rule
Most natural combining rules are decomposable
E.g. noisy-or decomposes into set of ands
followed by or

86
Parameters and Counts

Each time a node is derived from a rule r, it
provides one experiment to learn about the
parameters associated with r
Each such node should therefore make a separate
contribution to the count for those parameters
the parameter associated with
P(XxParentsXu) when rule r applies
the number of times a node has value x
and its parents have value u when rule r applies

87
EM With Parameter Sharing

Given parameter values, compute expected counts
where the inner sum is over all nodes derived
from rule r in BNi
Given expected counts, estimate
Iterate these two steps

88
Learning Rule Structure Kersting and De Raedt 02

Problem definition
Given a set of instances, each with
context predicates
observations about some random variables
Goal learn
a skeleton rule base consisting of rules without
uncertainty parameters
and parameter values for the rules
Generalizes BN structure learning
define legal models
scoring function same as for BN
define search operators

89
Legal Models

Hypothesis space consists of all rule sets using
given predicates, together with parameter values
A legal hypothesis
is logically valid rule set does not draw false
conclusions for any data cases
the constructed BN is acyclic for every instance

90
Search operators

Add a constant-free atom to the body of a single
clause
Remove a constant-free atom from the body of a
single clause

accepted(P) - author(P,A). accepted(P) -
quality(P).
91
Summary Rule-Based Approaches

Provide an intuitive way to describe how one fact
depends on other facts
Incorporate relationships between entities
Generalizes to many different situations
Constructed BN for a domain depends on which
objects exist and what the known relationships
are between them (context)
Inference at the ground level via KBMC
or lifted inference via FOVE
Both parameters and structure are learnable

92
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
Undirected Relational Approaches
Programming Language Approaches

93
Frame-based Approaches

Probabilistic Relational Models (PRMs)
Representation Inference Koller Pfeffer 98,
Pfeffer, Koller, Milch Takusagawa 99, Pfeffer
00
Learning Friedman et al. 99, Getoor, Friedman,
Koller Taskar 01 02, Getoor 01
Probabilistic Entity Relation Models (PERs)
Representation Heckerman, Meek Koller 04

94
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs
PRMs w/ Structural Uncertainty
PRMs w/ Class Hierarchies
Undirected Relational Models
Programming Language Approaches

95
Probabilistic Relational Models

Combine advantages of relational logic Bayesian
networks
natural domain modeling objects, properties,
relations
generalization over a variety of situations
compact, natural probability models.
Integrate uncertainty with relational model
properties of domain entities can depend on
properties of related entities
uncertainty over relational structure of domain.

96
Relational Schema
Author
Review
Good Writer
Mood
Smart
Length
Paper
Quality
Accepted
Has Review
Author of

Describes the types of objects and relations in
the database

97
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
Quality
Accepted
98
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
Quality
Accepted
99
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
Quality
8
.
0
2
.
0
,
t
f
Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
100
Relational Skeleton
Paper P1 Author A1 Review R1
Author A1
Review R1
Paper P2 Author A1 Review R2
Review R2
Author A2
Review R2
Paper P3 Author A2 Review R2

Fixed relational skeleton ?
set of objects in each class
relations between them

101
PRM w/ Attribute Uncertainty
Paper P1 Author A1 Review R1
Author A1
Review R1
Paper P2 Author A1 Review R2
Author A2
Review R2
Paper P3 Author A2 Review R2
Review R3
PRM defines distribution over instantiations of
attributes
102
A Portion of the BN
P2.Accepted
P3.Accepted
103
A Portion of the BN
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
P2.Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
P3.Accepted
104
A Portion of the BN
P2.Accepted
P3.Accepted
105
A Portion of the BN
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
P2.Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
P3.Accepted
106
PRM Aggregate Dependencies
Paper
Review
Mood
Quality
Length
Accepted
107
PRM Aggregate Dependencies
Paper
Review
Mood
Quality
Length
Accepted
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
mode
sum, min, max, avg, mode, count
108
PRM with AU Semantics
Author
Review R1
Author A1
Paper
Paper P1
Review R2
Author A2
Review
Paper P2
Review R3
Paper P3
PRM
relational skeleton ?

probability distribution over completions I
109
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Models
PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs
PRMs w/ Structural Uncertainty
PRMs w/ Class Hierarchies
Undirected Relational Models
Programming Language Approaches

110
PRM Inference

Simple idea enumerate all attributes of all
objects
Construct a Bayesian network over all the
attributes

111
Inference Example
Review R1
Skeleton
Paper P1
Review R2
Author A1
Review R3
Paper P2
Review R4
Query is P(A1.good-writer) Evidence is
P1.accepted T, P2.accepted T
112
PRM Inference Constructed BN
A1.Smart
A1.Good Writer
113
PRM Inference

Problems with this approach
constructed BN may be very large
doesnt exploit object structure
Better approach
reason about objects themselves
reason about whole classes of objects
In particular, exploit
reuse of inference
encapsulation of objects

114
PRM Inference Interfaces
Variables pertaining to R2 inputs and internal
attributes
A1.Smart
A1.Good Writer
P1.Quality
P1.Accepted
115
PRM Inference Interfaces
Interface imported and exported attributes
A1.Smart
A1.Good Writer
R2.Mood
P1.Quality
R2.Length
P1.Accepted
116
PRM Inference Encapsulation
R1 and R2 are encapsulated inside P1
A1.Smart
A1.Good Writer
117
PRM Inference Reuse
A1.Smart
A1.Good Writer
118
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
119
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
120
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
121
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
122
Structured Variable Elimination
Review 2
A1.Good Writer
R2.Mood
R2.Length
123
Structured Variable Elimination
Review 2
A1.Good Writer
R2.Mood
124
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
125
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-1
R2.Mood
P1.Quality
P1.Accepted
126
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-1
R2.Mood
P1.Quality
P1.Accepted
127
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
R2.Mood
R1.Mood
P1.Quality
P1.Accepted
128
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
R2.Mood
R1.Mood
P1.Quality
True
P1.Accepted
129
Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
130
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
131
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-2
132
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-2
133
Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
134
Structured Variable Elimination
Author 1
A1.Good Writer
135
Benefits of SVE

Structured inference leads to good elimination
orderings for VE
interfaces are separators
finding good separators for large BNs is very
hard
therefore cheaper BN inference
Reuses computation wherever possible

136
Limitations of SVE

Does not work when encapsulation breaks down
But when we dont have specific information about
the connections between objects, we assume that
encapsulation holds
i.e., if we know P1 has two reviewers R1 and R2
but they are not named instances, we assume R1
and R2 are encapsulated
Cannot reuse computation when different objects
have different evidence

R3 is not encapsulated inside P2
137
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs
PRMs w/ Structural Uncertainty
PRMs w/ Class Hierarchies
Undirected Relational Models
Programming Language Approaches

138
Learning PRMs w/ AU
Author
Database
Paper
Review
PRM
Author
Paper
Review

Parameter estimation

Structure selection

Relational Schema
139
ML Parameter Estimation
Review
Mood
Paper
Length
Quality
Accepted
140
ML Parameter Estimation
Review
Mood
Paper
Length
Quality
Accepted
q
141
Structure Selection

Idea
define scoring function
do local search over legal structures
Key Components
legal models
scoring models
searching model space

142
Structure Selection

Idea
define scoring function
do local search over legal structures
Key Components
legal models
scoring models
searching model space

143
Legal Models

PRM defines a coherent probability model over a
skeleton ? if the dependencies between object
attributes is acyclic

Paper P1 Accepted yes
author-of
Researcher Prof. Gump Reputation high
Paper P2 Accepted yes
sum
How do we guarantee that a PRM is acyclic for
every skeleton?
144
Attribute Stratification
PRM dependency structure S
dependency graph
Paper.Accepted
if Researcher.Reputation depends directly on
Paper.Accepted
Researcher.Reputation
Algorithm more flexible allows certain cycles
along guaranteed acyclic relations
145
Structure Selection

Idea
define scoring function
do local search over legal structures
Key Components
legal models
scoring models same as BN
searching model space

146
Structure Selection

Idea
define scoring function
do local search over legal structures
Key Components
legal models
scoring models
searching model space

147
Searching Model Space
Phase 0 consider only dependencies within a class
Author
Review
Paper
148
Phased Structure Search
Phase 1 consider dependencies from neighboring
classes, via schema relations
Author
Review
Paper
Author
Review
Paper
Add P.A?R.M
? score
Author
Review
Paper
149
Phased Structure Search
Phase 2 consider dependencies from further
classes, via relation chains
Author
Review
Paper
Author
Review
Paper
Add R.M?A.W
Author
Review
Paper
? score
150
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs
PRMs w/ Structural Uncertainty
PRMs w/ Class Hierarchies
Undirected Relational Models
Programming Language Approaches

151
Reminder PRM with AU Semantics
Author
Review R1
Author A1
Paper
Paper P1
Review R2
Author A2
Review
Paper P2
Review R3
Paper P3
PRM
relational skeleton ?

probability distribution over completions I
152
Kinds of structural uncertainty

How many objects does an object relate to?
how many Authors does Paper1 have?
Which object is an object related to?
does Paper1 cite Paper2 or Paper3?
Which class does an object belong to?
is Paper1 a JournalArticle or a ConferencePaper?
Does an object actually exist?
Are two objects identical?

153
Structural Uncertainty

Motivation PRM with AU only well-defined when
the skeleton structure is known
May be uncertain about relational structure
itself
Construct probabilistic models of relational
structure that capture structural uncertainty
Mechanisms
Reference uncertainty
Existence uncertainty
Number uncertainty
Type uncertainty
Identity uncertainty

154
Citation Relational Schema
Author
Institution
Research Area
Wrote
Paper
Paper
Topic
Topic
Word1
Word1
Word2
Cites

Word2

Citing Paper
WordN
Cited Paper
WordN
155
Attribute Uncertainty
Author
Institution
P( Institution Research Area)
Research Area
Wrote
P( Topic Paper.Author.Research Area
Paper
Topic
P( WordN Topic)
...
Word1
WordN
156
Reference Uncertainty

Bibliography
1. ----- 2. ----- 3. -----
Scientific Paper
Document Collection
157
PRM w/ Reference Uncertainty
Paper
Paper
Topic
Topic
Cites
Words
Words
Citing
Cited
Dependency model for foreign keys

Naïve Approach multinomial over primary key
noncompact
limits ability to generalize

158
Reference Uncertainty Example
Paper P5 Topic AI
Paper P4 Topic AI
Paper P3 Topic AI
Paper M2 Topic AI
Paper P5 Topic AI
C1
Paper P4 Topic Theory
Paper P1 Topic Theory
Paper P2 Topic Theory
Paper P1 Topic Theory
Paper P3 Topic AI
C2
Paper.Topic AI
Paper.Topic Theory
Cites
Citing
Cited
159
Reference Uncertainty Example
Paper P5 Topic AI
Paper P4 Topic AI
Paper P3 Topic AI
Paper M2 Topic AI
Paper P5 Topic AI
C1
Paper P4 Topic Theory
Paper P1 Topic Theory
Paper P2 Topic Theory
Paper P6 Topic Theory
Paper P3 Topic AI
C2
Paper.Topic AI
Paper.Topic Theory
C1
C2
Topic
Cites
Theory
Citing
AI
Cited
160
Introduce Selector RVs
P2.Topic
Cites1.Selector
P3.Topic
Cites1.Cited
P1.Topic
P4.Topic
Cites2.Selector
P5.Topic
Cites2.Cited
P6.Topic
Introduce Selector RV, whose domain is
C1,C2 The distribution over Cited depends on
all of the topics, and the selector
161
PRMs w/ RU Semantics
Paper
Paper
Topic
Topic
Cites
Words
Words
Cited
Citing
PRM RU
162
Learning PRMs w/ RU

Idea just like in PRMs w/ AU
define scoring function
do greedy local structure search
Issues
expanded search space
construct partitions
new operators

163
Learning
PRMs w/ RU

Idea
define scoring function
do phased local search over legal structures
Key Components
legal models
scoring models
searching model space

model new dependencies
unchanged
new operators
164
Legal Models
Review
Mood
Paper
Paper
Important
Important
Accepted
Cites
Accepted
Citing
Cited
165
Legal Models
Cites1.Selector
Cites1.Cited
P2.Important
R1.Mood
P3.Important
P1.Accepted
P4.Important
When a nodes parent is defined using an
uncertain relation, the reference RV must be a
parent of the node as well.
166
Structure Search
Cites
Author
Citing
Institution
Cited
Cited
167
Structure Search New Operators
Cites
Author
Citing
Institution
Cited
Refine on Topic
Cited
?score
Paper
Paper
Paper
Paper
Paper
168
Structure Search New Operators
Cites
Author
Citing
Institution
Cited
Refine on Topic
Cited
Paper
Paper
Paper
Paper
Paper
Refine on Author.Instition
?score
Paper
Paper
Paper
169
PRMs w/ RU Summary

Define semantics for uncertainty over which
entities are related to each other
Search now includes operators Refine and Abstract
for constructing foreign-key dependency model
Provides one simple mechanism for link
uncertainty

170
Existence Uncertainty
Document Collection
Document Collection
171
PRM w/ Exists Uncertainty
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
Dependency model for existence of relationship
172
Exists Uncertainty Example
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
False
True
Cited.Topic
Citer.Topic
173
Introduce Exists RVs
Author 2
Author 1
Inst
Area
Inst
Area
Paper3
Paper1
Paper2
Topic
Topic
Topic
WordN
WordN
Word1
Word1
Word1
...
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
1-3
3-2
174
Introduce Exists RVs
Author 2
Author 1
Inst
Area
Inst
Area
Paper3
Paper1
Paper2
Topic
Topic
Topic
WordN
WordN
Word1
Word1
Word1
...
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
1-3
3-2
175
PRMs w/ EU Semantics
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
PRM EU
176
Learning PRMs w/ EU

Idea just like in PRMs w/ AU
define scoring function
do greedy local structure search
Issues
efficiency
Computation of sufficient statistics for exists
attribute
Do not explicitly consider relations that do not
exist

177
Structure Selection
PRMs w/ EU

Idea
define scoring function
do phased local search over legal structures
Key Components
legal models
scoring models
searching model space

model new dependencies
unchanged
unchanged
178
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs
PRMs w/ Structural Uncertainty
PRMs w/ Class Hierarchies
Undirected Relational Models
Programming Language Approaches

179
PRMs with classes

Relations organized in a class hierarchy
Subclasses inherit their probability model from
superclasses
Instances are a special case of subclasses of
size 1
As you descend through the class hierarchy, you
can have richer dependency models
e.g. cannot say Accepted(P1) lt- Accepted(P2)
(cyclic)
but can say Accepted(JournalP1) lt-
Accepted(ConfP2)

Venue
Journal
Conference
180
Type Uncertainty

Is 1st-Venue a Journal or Conference ?
Create 1st-Journal and 1st-Conference objects
Introduce Type(1st-Venue) variable with possible
values Journal and Conference
Make 1st-Venue equal to 1st-Journal or
1st-Conference according to value of
Type(1st-Venue)

181
Learning PRM-CHs
Vote
Database
Person
Instance I
PRM-CH
Vote

Class hierarchy provided

TVProgram
Person
Relational Schema
182
Structure Selection
PRMs w/ CH

Idea
define scoring function
do phased local search over legal structures
Key Components
legal models
scoring models
searching model space

model new dependencies
unchanged
new operators
183
Guaranteeing Acyclicity with Subclasses
184
Learning PRM-CH

Scenario 1 Class hierarchy is provided
New Operators
Specialize/Inherit

AcceptedPaper
AcceptedJournal
AcceptedConference
AcceptedWorkshop
185
Learning Class Hierarchy

Issue partially observable data set
Construct decision tree for class defined over
attributes observed in training set

New operator

Split on class attribute

Related class attribute

186
PRMs w/ Class Hierarchies

Allows us to
Refine a heterogenous class into more coherent
subclasses
Refine probabilistic model along class hierarchy
Can specialize/inherit CPDs
Construct new dependencies that were originally
acyclic

Provides bridge from class-based model to
instance-based model
187
PRM-CH Summary

PRMs with class hierarchies are a natural
extension of PRMs
Specialization/Inheritance of CPDs
Allows new dependency structures
Provide bridge from class-based to instance-based
models
Learning techniques proposed
Need efficient heuristics
Empirical validation on real-world domains

188
Summary Frame-based Approaches

Focus on objects and relationships
what types of objects are there, and how are they
related to each other?
how does a property of an object depend on other
properties (of the same or other objects)?
Representation support
Attribute uncertainty
Structural uncertainty
Class Hierarchies
Efficient Inference and Learning Algorithms

189
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
Undirected Relational Approaches
Background Markov Networks
Frame-based Approaches
Rule-based Approaches
Programming Language Approaches

190
Markov Networks
Author1 Fame
Author2 Fame
Author3 Fame
Author4 Fame
nodes domain variables edges mutual influence
Network structure encodes conditional
independencies I(A1 Fame, A4 Fame A2
Fame, A3 Fame)
191
Markov Network Semantics
F1
F2
conditional independencies in MN structure
local clique potentials
full joint distribution over domain

F3
F4
where Z is a normalizing factor that ensures that
the probabilities sum to 1
Good news no acyclicity constraints Bad news
global normalization (1/Z)
192
Roadmap

Motivation
Background
Rule-based Approaches
Frame-based Approaches
Undirected Relational Approaches
Background Markov Networks
Frame-based Approaches
Markov Relational Networks Taskar, Segal
Koller 01, Taskar, Abbeel Koller 02, Taskar,
Guestrin, Koller Taskar 04
Rule-based Approaches
Programming Language Approaches

193
Advantages of Undirected Models

Symmetric, non-causal interactions
Web categories of linked pages are correlated
Social nets individual correlated with peers
Cannot introduce direct edges because
of cycles
Patterns involving multiple entities
Web triangle patterns
Social nets transitive relations

194
Relational Markov Networks

Locality
Local probabilistic dependencies given by
relational links
Universals
Same dependencies hold for all objects linked in
a particular pattern

Template potential
Author1
Paper1

Write a Comment

User Comments (0)

About PowerShow.com

Representation, Inference and Learning in Relational Probabilistic Languages PowerPoint PPT Presentation