Title: Representation, Inference and Learning in Relational Probabilistic Languages
1Representation, Inference and Learning in
Relational Probabilistic Languages
- Lise Getoor
- University of Maryland
- College Park
Avi Pfeffer Harvard University
IJCAI 2005 Tutorial
2Introduction
- Probability is good
- First-order representations are good
- Variety of approaches that combine them
- We wont cover all of them in detail
- apologies if we leave out your favorite
- We will cover three broad classes of approaches,
and present exemplars of each approach - We will highlight common issues, themes, and
techniques that recur in different approaches
3Running Example
- There are papers, researchers, citations,
reviews - Papers have a quality, and may or may not be
accepted - Authors may be smart and good writers
- Papers have topics, and cite other papers which
may or may not be on the same topic - Papers are reviewed by reviewers, who have moods
that are influenced by the quality of the writing
4Some Queries
- What is the probability that a researcher is
famous, given that one of her papers was accepted
despite the fact that a reviewer was in a bad
mood? - What is the probability that a paper is accepted,
given that another paper by the same author is
accepted? - What is the probability that a paper is an AI
paper, given that it is cited by an AI paper? - What is the probability that a student of a
famous advisor has seven high quality papers?
5Sample Domains
- Web Pages and Link Analysis
- Battlespace Awareness
- Epidemiological Studies
- Citation Networks
- Communication Networks (Cellphone Fraud
Detection) - Intelligence Analysis (Terrorist Networks)
- Financial Transactions (Money Laundering)
- Computational Biology
- Object Recognition and Scene Analysis
- Natural Language Processing (e.g. Information
Extraction and Semantic Parsing)
6Roadmap
- Motivation
- Background Bayesian network inference and
learning - Rule-based Approaches
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
7Bayesian Networks Pearl 87
Smart
Good Writer
Reviewer Mood
Quality
nodes domain variables edges direct causal
influence
Accepted
Review Length
Network structure encodes conditional
independencies I(Review-Length ,
Good-Writer Reviewer-Mood)
8BN Semantics
conditional independencies in BN structure
local CPTs
full joint distribution over domain
- Compact natural representation
- nodes have ? k parents ?? O(2k n) vs. O(2n)
params - natural parameters
9Reasoning in BNs
- Full joint distribution specifies answer to any
query P(event evidence) - Allows combination of different types of
reasoning - Causal P(Reviewer-Mood Good-Writer)
- Evidential P(Reviewer-Mood not Accepted)
- Intercausal P(Reviewer-Mood not Accepted,
Quality)
10Variable Elimination Zhang Poole 96, Dechter
98
factors
A factor is a function from values of variables
to positive real numbers
11Variable Elimination
12Variable Elimination
sum out l
13Variable Elimination
new factor
14Variable Elimination
multiply factors together then sum out w
15Variable Elimination
new factor
16Variable Elimination
17Some Other Inference Algorithms
- Exact
- Junction Tree Lauritzen Spiegelhalter 88
- Cutset Conditioning Pearl 87
- Approximate
- Loopy Belief Propagation McEliece et al 98
- Likelihood Weighting Shwe Cooper 91
- Markov Chain Monte Carlo eg MacKay 98
- Gibbs Sampling Geman Geman 84
- Metropolis-Hastings Metropolis et al 53,
Hastings 70 - Variational Methods Jordan et al 98
- etc.
18Learning in BNs
Parameters only
Structure and Parameters
EM Dempster et al 77 or gradient descent
Russell et al 95
Complete Data
Easy counting
Incomplete Data
Structure search
Structural EM Friedman 97
See Heckerman 98 for a general introduction
19Parameter Estimation in BNs
- Assume known dependency structure G
- Goal estimate BN parameters q
- entries in local probability models,
- q is good if its likely to generate observed
data. - MLE Principle Choose q so as to maximize l
- Alternative incorporate a prior
20Learning With Complete Data
- Fully observed data data consists of set of
instances, each with a value for all BN variables - With fully observed data, we can compute
number of instances with , and - and similarly for other counts
- We then estimate
21Learning with Missing Data Expectation-Maximizati
on (EM)
- Cant compute
- But
- Given parameter values, can compute expected
counts - Given expected counts, estimate parameters
- Begin with arbitrary parameter values
- Iterate these two steps
- Converges to local maximum of likelihood
this requires BN inference
22Structure search
- Begin with an empty network
- Consider all neighbors reached by a search
operator that are acyclic - add an edge
- remove an edge
- reverse an edge
- For each neighbor
- compute ML parameter values
- compute score(s)
- Choose the neighbor with the highest score
- Continue until we reach a local maximum
23Limitations of BNs
- Inability to generalize across collection of
individuals within a domain - if you want to talk about multiple individuals in
a domain, you have to talk about each one
explicitly, with its own local probability model - Domains have fixed structure e.g. one author,
one paper and one reviewer - if you want to talk about domains with multiple
inter-related individuals, you have to create a
special purpose network for the domain - For learning, all instances have to have the same
set of entities
24First Order Approaches
- Advantages of first order probabilistic models
- represent world in terms of individuals and
relationships between them - ability to generalize about many instances in
same domain - allow compact parameterization
- support reasoning about general classes of
individuals rather than the individuals
themselves - allow representation of high level structure, in
which objects interact weakly with each other
25Three Different Approaches
- Rule-based approaches focus on facts
- what is true in the world?
- what facts do other facts depend on?
- Frame-based approaches focus on objects and
relationships - what types of objects are there, and how are they
related to each other? - how does a property of an object depend on other
properties (of the same or other objects)? - Programming language approaches focus on
processes - how is the world generated?
- how does one event influence another event?
26Roadmap
- Motivation
- Background
- Rule-based Approaches
- Basic Approach
- Knowledge-Based Model Construction
- Issues
- First-Order Variable Elimination
- Learning
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
27Flavors
- Goldman Charniak 93
- Breese 92
- Probabilistic Horn Abduction Poole 93
- Probabilistic Logic Programming Ngo Haddawy
96 - Relational Bayesian Networks Jaeger 97
- Bayesian Logic Programs Kersting de Raedt 00
- Stochastic Logic Programs Muggleton 96
- PRISM Sato Kameya 97
- CLP(BN) Costa et al. 03
- etc.
28Intuitive Approach
- In logic programming,
- accepted(P) - author(P,A), famous(A).
- means
- For all P,A if A is the author of P and A is
famous, then P is accepted - This is a categorical inference
- But this will not be true in many cases
29Fudge Factors
- Use
- accepted(P) - author(P,A), famous(A). (0.6)
- This means
- For all P,A if A is the author of P and A is
famous, then P is accepted with probability 0.6 - But what does this mean when there are other
possible causes of a paper being accepted? - e.g. accepted(P) - high_quality(P). (0.8)
30Intuitive Meaning
- accepted(P) - author(P,A), famous(A). (0.6)
- means
- For all P,A if A is the author of P and A is
famous, then P is accepted with probability 0.6,
provided no other possible cause of the paper
being accepted holds - If more than one possible cause holds, a
combining rule is needed to combine the
probabilities
31Meaning of Disjunction
- In logic programming
- accepted(P) - author(P,A), famous(A).
- accepted(P) - high_quality(P).
- means
- For all P,A if A is the author of P and A is
famous, or if P is high quality, then P is
accepted
32Intuitive Meaning of Probabilistic Disjunction
- For us
- accepted(P) - author(P,A), famous(A). (0.6)
- accepted(P) - high_quality(P). (0.8)
- means
- For all P,A, if (A is the author of P and A is
famous successfully cause P to be accepted) or (P
is high quality successfully causes P to be
accepted), then P is accepted. - If A is the author of P and A is famous, they
successfully cause P to be accepted with
probability 0.6. - If P is high quality, it successfully causes P to
be accepted with probability 0.8.
33Noisy-Or
- Multiple possible causes of an effect
- Each cause, if it is true, successfully causes
the effect with a given probability - Effect is true if any of the possible causes is
true and successfully causes it - All causes act independently to produce the
effect (causal independence) - Note accepted(P) - author(P,A), famous(A).
(0.6) may produce multiple possible causes for
different values of A - Leak probability effect may happen with no cause
- e.g. accepted(P). (0.1)
34Noisy-Or
author(p1,alice)
author(p1,bob)
high_quality(p1)
famous(alice)
famous(bob)
0.6
0.6
0.8
accepted(p1)
35Computing Noisy-Or Probabilities
- What is P(accepted(p1)) given that Alice is an
author and Alice is famous, and that the paper is
high quality, but no other possible cause is true?
leak
36Combination Rules
- Other combination rules are possible
- E.g. max
- In our case,
- P(accepted(p1)) max 0.6,0.8,0.1 0.8
- Harder to interpret in terms of logic program
37Roadmap
- Motivation
- Background
- Rule-based Approaches
- Basic Approach
- Knowledge-Based Model Construction
- Issues
- First-Order Variable Elimination
- Learning
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
38Knowledge-Based Model Construction (KBMC)
- Construct a Bayesian network, given a query Q and
evidence E - query and evidence are sets of ground atoms,
i.e., predicates with no variable symbols - e.g. author(p1,alice)
- Construct network by searching for possible
proofs of the query and the variables - Use standard BN inference techniques on
constructed network
39KBMC Example
- smart(alice). (0.8)
- smart(bob). (0.9)
- author(p1,alice). (0.7)
- author(p1,bob). (0.3)
- high_quality(P) - author(P,A), smart(A). (0.5)
- high_quality(P). (0.1)
- accepted(P) - high_quality(P). (0.9)
- Query is accepted(p1).
- Evidence is smart(bob).
40Backward Chaining
- Start with evidence variable smart(bob)
smart(bob)
41Backward Chaining
- Rule for smart(bob) has no antecedents stop
backward chaining
smart(bob)
42Backward Chaining
- Begin with query variable accepted(p1)
smart(bob)
accepted(p1)
43Backward Chaining
- Rule for accepted(p1) has antecedent
high_quality(p1) add high_quality(p1) to
network, and make parent of accepted(p1)
smart(bob)
high_quality(p1)
accepted(p1)
44Backward Chaining
- All of accepted(p1)s parents have been found
create its conditional probability table (CPT)
smart(bob)
high_quality(p1)
high_quality(p1)
accepted(p1)
hq
0.7
0.3
accepted(p1)
hq
0
1
45Backward Chaining
- high_quality(p1) - author(p1,A), smart(A) has
two groundings Aalice and Abob
smart(bob)
high_quality(p1)
accepted(p1)
46Backward Chaining
- For grounding Aalice, add author(p1,alice) and
smart(alice) to network, and make parents of
high_quality(p1)
smart(bob)
smart(alice)
author(p1,alice)
high_quality(p1)
accepted(p1)
47Backward Chaining
- For grounding Abob, add author(p1,bob) to
network. smart(bob) is already in network. Make
both parents of high_quality(p1)
smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
48Backward Chaining
- Create CPT for high_quality(p1) make noisy-or,
and dont forget leak probability
smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
49Backward Chaining
- author(p1,alice), smart(alice) and author(p1,bob)
have no antecedents stop backward chaining
smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
50Backward Chaining
- assert evidence smart(bob) true, and compute
P(accepted(p1) smart(bob) true)
true
smart(bob)
smart(alice)
author(p1,alice)
author(p1,bob)
high_quality(p1)
accepted(p1)
51Backward Chaining on Both Query and Evidence
- Necessary, if query and evidence have common
ancestor - Sufficient. P(Query Evidence) can be computed
using only ancestors of query and evidence nodes - unobserved descendants are irrelevant
Ancestor
Query
Evidence
52Roadmap
- Motivation
- Background
- Rule-based Approaches
- Basic Approach
- Knowledge-Based Model Construction
- Issues
- First-Order Variable Elimination
- Learning
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
53The Role of Context
- Context is deterministic knowledge known prior to
the network being constructed - May be defined by its own logic program
- Is not a random variable in the BN
- Used to determine the structure of the
constructed BN - If a context predicate P appears in the body of a
rule R, only backward chain on R if P is true
54Context example
- Suppose author(P,A) is a context predicate,
author(p1,bob) is true, and author(p1,alice)
cannot be proven from deterministic KB (and is
therefore false by assumption) - Network is
No author(p1,bob) node because it is a context
predicate
smart(bob)
No smart(alice) node because author(p1,alice) is
false
high_quality(p1)
accepted(p1)
55Basic Assumptions
- No cycles in resulting BN
- If there are cycles, cannot interpret BN as
definition of joint probability distribution - Model construction process terminates
- in particular, no function symbols. Consider
- famous(X) - famous(advisor(X)).
- this creates an infinite backwards chain
famous(advisor(advisor(X)))
famous(advisor(X))
famous(X)
56Resolving Cycles Glesner
Koller 95
- Deal with cycles by introducing time
- E.g.
- famous(X) - coauthor(X,Y), famous(Y).
- becomes
- famous(X,T) - coauthor(X,Y,T-1), famous(Y,T-1).
- Defines a dynamic Bayesian network
57Resolving Infinite Networks Pfeffer Koller 00
- How to deal with famous(X) - famous(advisor(X)).
- KBMC will construct an infinite network cant
do - To compute P(famous(alice))
- Expand network backwards n generations
- Assume uniform distribution at roots
- Compute P n(famous(alice))
- The series P 1(famous(alice)), P
2(famous(alice)), will converge as n?? - The limit is defined to be P(famous(alice))
58Reusing Inference
- Iteration 1 compute P 1(famous(X))
- Iteration 2 compute P 2(famous(X))
- requires P 1(famous(advisor(X))) P 1(famous(X))
- already computed reuse result
- Iteration 3 compute P 3(famous(X))
- requires P 2(famous(advisor(X))) P 2(famous(X))
- already computed reuse result
- Constant amount of work per iteration
59Roadmap
- Motivation
- Background
- Rule-based Approaches
- Basic Approach
- Knowledge-Based Model Construction
- Issues
- First-Order Variable Elimination
- Learning
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
60Semantics
- Assuming BN construction process terminates,
conditional probability of any query given any
evidence is defined by the BN. - Somewhat unsatisfying because
- meaning of program is query dependent (depends
on constructed BN) - meaning is not stated declaratively in terms of
program but in terms of constructed network
instead
61Disadvantage of Intuitive Approach
- Up until now, ground logical atoms have been
random variables ranging over T,F - cumbersome to have a different random variable
for lead_author(p1,alice), lead_author(p1,bob)
and all possible values of lead_author(p1,A) - worse, since lead_author(p1,alice) and
lead_author(p1,bob) are different random
variables, it is possible for both to be true at
the same time
62Bayesian Logic Programs Kersting and de Raedt
00, following Ngo Haddawy 95
- Now, ground atoms are random variables with any
range (not necessarily Boolean) - now quality is a random variable, with values
high, medium, low - Any probabilistic relationship is allowed
- expressed in CPT
- Semantics of program given once and for all
- not query dependent
63Meaning of Rules in BLPs
- accepted(P) - quality(P).
- means
- For all P, if quality(P) is a random variable,
then accepted(P) is a random variable - Associated with this rule is a conditional
probability table (CPT) that specifies the
probability distribution over accepted(P) for any
possible value of quality(P)
64Combining Rules for BLPs
- accepted(P) - quality(P).
- accepted(P) - author(P,A), fame(A).
- Before, combining rules combined individual
probabilities with each other - noisy-or and max rules made sense
- Now, combining rules combine entire CPTs
- can still define noisy-or and max
- harder to interpret
65Semantics of BLPs
- Random variables are all ground atoms that have
finite proofs in logic programs - assumes acyclicity
- assumes no function symbols
- Can construct BN over all random variables
- parents derived from rules
- CPTs derived using combining rules
- Semantics of BLP joint probability distribution
over all random variables - does not depend on query
- Inference in BLP by KBMC
66An Issue
- How to specify uncertainty over single-valued
relations? - Approach 1 make lead_author(P) a random variable
taking values bob, alice etc. - we cant say accepted(P) - lead_author(P),
famous(A) because A does not appear in the rule
head or in a previous term in the body - Approach 2 make lead_author(P,A) a random
variable with values true, false - we run into the same problems as with the
intuitive approach (may have zero or many lead
authors) - Approach 3 make lead_author a function
- say accepted(P) - famous(lead_author(P))
- BLPs need to specify how to deal with function
symbols and uncertainty over them
67First-Order Variable Elimination Poole 03, see
also Braz et al 05
- Generalization of variable elimination to first
order domains - Reasons directly about first-order variables,
instead of at the ground level - Assumes that the size of the population for each
type of entity is known
68FOVE Example
- famous(XPerson) - coauthor(X,Y). (0.2)
- coauthor(XPerson,YPerson) - knows(X,Y). (0.3)
- knows(XPerson,YPerson). (0.01)
- Person 1000
- Evidence knows(alice,bob)
- Query famous(alice)
69What KBMC Will Produce
knows(a,b)
knows(a,c)
knows(a,d)
1000 times
coauthor(a,b)
coauthor(a,c)
coauthor(a,d)
famous(alice)
70Better Idea
- Instead of grounding out all variables, reason
about some of them at the lifted level - Eliminate entire relations at a time, instead of
individual ground terms - Use parameterized variables, e.g. reason directly
about coauthor(X,Y) - Use the known population size to quantify over
populations
71Parameterized Factors or Parfactors
- Functions from parameterized variables to
positive real numbers cf. factors in VE - Plus constraints on parameters
X alice
knows(X,Y)
coauthor(X,Y)
f
f
1
f
t
0
t
f
0.7
t
t
0.3
72Splitting
knows(X,Y)
coauthor(X,Y)
f
f
1
Split
produces
on
Y bob
f
t
0
t
f
0.7
t
t
0.3
Y ? bob
Y bob
knows(X,Y)
coauthor(X,Y)
knows(X,Y)
coauthor(X,Y)
f
f
1
f
f
1
f
t
0
f
t
0
t
f
0.7
t
f
0.7
t
t
0.3
t
t
0.3
residual
73Conditioning on Evidence
Condition
produces
on
knows(alice,bob)
X ? alice or Y ? bob
X alice Y bob
coauthor(X,Y)
knows(X,Y)
coauthor(X,Y)
f
f
1
f
0.7
f
t
0
t
0.3
t
f
0.7
t
t
0.3
In reality, constraints are conjunctive. Three
parfactors X alice Y bob, X ? alice
and X alice Y ? bob will be produced
74Eliminating knows(X,Y)
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
1
Multiply
by
produces
f
t
0
t
f
0.7
t
t
0.3
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
0.99
f
t
0
t
f
0.007
t
t
0.003
75Eliminating knows(X,Y)
X ? alice or Y ? bob
knows(X,Y)
coauthor(X,Y)
f
f
0.99
Summing out knows(X,Y) in
produces
f
t
0
t
f
0.007
t
t
0.003
X ? alice or Y ? bob
coauthor(X,Y)
f
0.997
t
0.003
76Eliminating coauthor(X,Y) Multiplying Multiple
Parfactors
- Use unification to decide which factors to
multiply, - and what their constraints will be
X alice
famous(X)
coauthor(X,Y)
f
f
1
f
t
0.8
t
f
0
t
t
0.2
X ? alice or Y ? bob
X alice Y bob
coauthor(X,Y)
coauthor(X,Y)
f
0.7
f
0.997
t
0.3
t
0.003
77Multiplying Multiple Parfactors
- Multiply each pair of factors that unify, to
produce
X alice Y ? bob
X alice Y bob
famous(X)
coauthor(X,Y)
famous(X)
coauthor(X,Y)
f
f
0.7
f
f
0.997
f
t
0.24
f
t
0.0024
t
f
0
t
f
0
t
t
0.06
t
t
0.0006
78Aggregating Over Populations
X alice Y ? bob
famous(X)
coauthor(X,Y)
f
f
0.997
The parfactor
represents a
f
t
0.0024
t
f
0
t
t
0.0006
ground factor for each person in the population
other than bob. These factors combine via noisy
or.
population size - 1
from X alice Y bob parfactor
79Detail Determining Variables in Product
k(X2,Y2)
f(X2,Y2)
k(X1,Y1)
f
f
1
Multiplying
by
produces
f
0.99
f
t
0
t
0.01
t
f
0.7
t
t
0.3
X1?X2 or Y1 ?Y2
k(X1,Y1)
f(X2,Y2)
k(X2,Y2)
k(X2,Y2)
f(X2,Y2)
f
f
0.99
f
f
f
0.99
f
t
0
f
f
t
0
f
f
0.693
t
t
f
0.007
and
f
t
0.297
t
t
t
0.003
t
f
0.01
f
t
f
0
t
for the case where X1X2 and Y1Y2
t
t
0.007
f
t
t
0.003
t
80Other details
- When multiplying two parfactors, compute their
most general unifier (mgu) - Split the parfactors on the mgu
- Keep the residuals
- Multiply the non-residuals together
- See Poole 03 for more details
- See Braz, Amir and Roth 05 at IJCAI!
81Roadmap
- Motivation
- Background
- Rule-based Approaches
- Basic Approach
- Knowledge-Based Model Construction
- Issues
- First-Order Variable Elimination
- Learning
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
82Learning Rule Parameters Koller Pfeffer 97,
Sato Kameya 01
- Problem definition
- Given a skeleton rule base consisting of rules
without uncertainty parameters - and a set of instances, each with
- a set of context predicates
- observations about some random variables
- Goal learn parameter values for the rules that
maximize the likelihood of the data
83Basic Approach
- Construct a network BNi for each instance i using
KBMC, backward chaining on all the observed
variables - Expectation Maximization (EM)
- exploit parameter sharing
84Parameter Sharing
- In BNs, all random variables have distinct CPTs
- only share parameters between different
instances, not different random variables - In logical approaches, an instance may contain
many objects of the same kind - multiple papers, multiple authors, multiple
citations - Parameters are shared within instances
- same parameters used across different papers,
authors, citations - Parameter sharing allows faster learning, and
learning from a single instance
85Rule Parameters and CPT Entries
- In principle, combining rules produce complicated
relationship between model parameters and CPT
entries - With a decomposable combining rule, each node is
derived from a single rule - Most natural combining rules are decomposable
- E.g. noisy-or decomposes into set of ands
followed by or
86Parameters and Counts
- Each time a node is derived from a rule r, it
provides one experiment to learn about the
parameters associated with r - Each such node should therefore make a separate
contribution to the count for those parameters - the parameter associated with
P(XxParentsXu) when rule r applies - the number of times a node has value x
and its parents have value u when rule r applies
87EM With Parameter Sharing
- Given parameter values, compute expected counts
-
- where the inner sum is over all nodes derived
from rule r in BNi - Given expected counts, estimate
- Iterate these two steps
88Learning Rule Structure Kersting and De Raedt 02
- Problem definition
- Given a set of instances, each with
- context predicates
- observations about some random variables
- Goal learn
- a skeleton rule base consisting of rules without
uncertainty parameters - and parameter values for the rules
- Generalizes BN structure learning
- define legal models
- scoring function same as for BN
- define search operators
89Legal Models
- Hypothesis space consists of all rule sets using
given predicates, together with parameter values - A legal hypothesis
- is logically valid rule set does not draw false
conclusions for any data cases - the constructed BN is acyclic for every instance
90Search operators
- Add a constant-free atom to the body of a single
clause - Remove a constant-free atom from the body of a
single clause
accepted(P) - author(P,A). accepted(P) -
quality(P).
91Summary Rule-Based Approaches
- Provide an intuitive way to describe how one fact
depends on other facts - Incorporate relationships between entities
- Generalizes to many different situations
- Constructed BN for a domain depends on which
objects exist and what the known relationships
are between them (context) - Inference at the ground level via KBMC
- or lifted inference via FOVE
- Both parameters and structure are learnable
92Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- Undirected Relational Approaches
- Programming Language Approaches
93Frame-based Approaches
- Probabilistic Relational Models (PRMs)
- Representation Inference Koller Pfeffer 98,
Pfeffer, Koller, Milch Takusagawa 99, Pfeffer
00 - Learning Friedman et al. 99, Getoor, Friedman,
Koller Taskar 01 02, Getoor 01 - Probabilistic Entity Relation Models (PERs)
- Representation Heckerman, Meek Koller 04
94Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- PRMs w/ Attribute Uncertainty
- Inference in PRMs
- Learning in PRMs
- PRMs w/ Structural Uncertainty
- PRMs w/ Class Hierarchies
- Undirected Relational Models
- Programming Language Approaches
95Probabilistic Relational Models
- Combine advantages of relational logic Bayesian
networks - natural domain modeling objects, properties,
relations - generalization over a variety of situations
- compact, natural probability models.
- Integrate uncertainty with relational model
- properties of domain entities can depend on
properties of related entities - uncertainty over relational structure of domain.
96Relational Schema
Author
Review
Good Writer
Mood
Smart
Length
Paper
Quality
Accepted
Has Review
Author of
- Describes the types of objects and relations in
the database
97Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
Quality
Accepted
98Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
Quality
Accepted
99Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
Quality
8
.
0
2
.
0
,
t
f
Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
100Relational Skeleton
Paper P1 Author A1 Review R1
Author A1
Review R1
Paper P2 Author A1 Review R2
Review R2
Author A2
Review R2
Paper P3 Author A2 Review R2
- Fixed relational skeleton ?
- set of objects in each class
- relations between them
101PRM w/ Attribute Uncertainty
Paper P1 Author A1 Review R1
Author A1
Review R1
Paper P2 Author A1 Review R2
Author A2
Review R2
Paper P3 Author A2 Review R2
Review R3
PRM defines distribution over instantiations of
attributes
102A Portion of the BN
P2.Accepted
P3.Accepted
103A Portion of the BN
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
P2.Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
P3.Accepted
104A Portion of the BN
P2.Accepted
P3.Accepted
105A Portion of the BN
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
P2.Accepted
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
P3.Accepted
106PRM Aggregate Dependencies
Paper
Review
Mood
Quality
Length
Accepted
107PRM Aggregate Dependencies
Paper
Review
Mood
Quality
Length
Accepted
P(A Q, M)
,
M
Q
9
.
0
1
.
0
,
f
f
8
.
0
2
.
0
,
t
f
4
.
0
6
.
0
,
f
t
3
.
0
7
.
0
,
t
t
mode
sum, min, max, avg, mode, count
108PRM with AU Semantics
Author
Review R1
Author A1
Paper
Paper P1
Review R2
Author A2
Review
Paper P2
Review R3
Paper P3
PRM
relational skeleton ?
probability distribution over completions I
109Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Models
- PRMs w/ Attribute Uncertainty
- Inference in PRMs
- Learning in PRMs
- PRMs w/ Structural Uncertainty
- PRMs w/ Class Hierarchies
- Undirected Relational Models
- Programming Language Approaches
110PRM Inference
- Simple idea enumerate all attributes of all
objects - Construct a Bayesian network over all the
attributes
111Inference Example
Review R1
Skeleton
Paper P1
Review R2
Author A1
Review R3
Paper P2
Review R4
Query is P(A1.good-writer) Evidence is
P1.accepted T, P2.accepted T
112PRM Inference Constructed BN
A1.Smart
A1.Good Writer
113PRM Inference
- Problems with this approach
- constructed BN may be very large
- doesnt exploit object structure
- Better approach
- reason about objects themselves
- reason about whole classes of objects
- In particular, exploit
- reuse of inference
- encapsulation of objects
114PRM Inference Interfaces
Variables pertaining to R2 inputs and internal
attributes
A1.Smart
A1.Good Writer
P1.Quality
P1.Accepted
115PRM Inference Interfaces
Interface imported and exported attributes
A1.Smart
A1.Good Writer
R2.Mood
P1.Quality
R2.Length
P1.Accepted
116PRM Inference Encapsulation
R1 and R2 are encapsulated inside P1
A1.Smart
A1.Good Writer
117PRM Inference Reuse
A1.Smart
A1.Good Writer
118Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
119Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
120Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
121Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
122Structured Variable Elimination
Review 2
A1.Good Writer
R2.Mood
R2.Length
123Structured Variable Elimination
Review 2
A1.Good Writer
R2.Mood
124Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-2
Review-1
P1.Quality
P1.Accepted
125Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-1
R2.Mood
P1.Quality
P1.Accepted
126Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
Review-1
R2.Mood
P1.Quality
P1.Accepted
127Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
R2.Mood
R1.Mood
P1.Quality
P1.Accepted
128Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
R2.Mood
R1.Mood
P1.Quality
True
P1.Accepted
129Structured Variable Elimination
Paper 1
A1.Smart
A1.Good Writer
130Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-1
Paper-2
131Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-2
132Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
Paper-2
133Structured Variable Elimination
Author 1
A1.Smart
A1.Good Writer
134Structured Variable Elimination
Author 1
A1.Good Writer
135Benefits of SVE
- Structured inference leads to good elimination
orderings for VE - interfaces are separators
- finding good separators for large BNs is very
hard - therefore cheaper BN inference
- Reuses computation wherever possible
136Limitations of SVE
- Does not work when encapsulation breaks down
- But when we dont have specific information about
the connections between objects, we assume that
encapsulation holds - i.e., if we know P1 has two reviewers R1 and R2
but they are not named instances, we assume R1
and R2 are encapsulated - Cannot reuse computation when different objects
have different evidence
R3 is not encapsulated inside P2
137Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- PRMs w/ Attribute Uncertainty
- Inference in PRMs
- Learning in PRMs
- PRMs w/ Structural Uncertainty
- PRMs w/ Class Hierarchies
- Undirected Relational Models
- Programming Language Approaches
138Learning PRMs w/ AU
Author
Database
Paper
Review
PRM
Author
Paper
Review
Relational Schema
139ML Parameter Estimation
Review
Mood
Paper
Length
Quality
Accepted
140ML Parameter Estimation
Review
Mood
Paper
Length
Quality
Accepted
q
141Structure Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
142Structure Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
143Legal Models
- PRM defines a coherent probability model over a
skeleton ? if the dependencies between object
attributes is acyclic
Paper P1 Accepted yes
author-of
Researcher Prof. Gump Reputation high
Paper P2 Accepted yes
sum
How do we guarantee that a PRM is acyclic for
every skeleton?
144Attribute Stratification
PRM dependency structure S
dependency graph
Paper.Accepted
if Researcher.Reputation depends directly on
Paper.Accepted
Researcher.Reputation
Algorithm more flexible allows certain cycles
along guaranteed acyclic relations
145Structure Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- legal models
- scoring models same as BN
- searching model space
146Structure Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
147Searching Model Space
Phase 0 consider only dependencies within a class
Author
Review
Paper
148Phased Structure Search
Phase 1 consider dependencies from neighboring
classes, via schema relations
Author
Review
Paper
Author
Review
Paper
Add P.A?R.M
? score
Author
Review
Paper
149Phased Structure Search
Phase 2 consider dependencies from further
classes, via relation chains
Author
Review
Paper
Author
Review
Paper
Add R.M?A.W
Author
Review
Paper
? score
150Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- PRMs w/ Attribute Uncertainty
- Inference in PRMs
- Learning in PRMs
- PRMs w/ Structural Uncertainty
- PRMs w/ Class Hierarchies
- Undirected Relational Models
- Programming Language Approaches
151Reminder PRM with AU Semantics
Author
Review R1
Author A1
Paper
Paper P1
Review R2
Author A2
Review
Paper P2
Review R3
Paper P3
PRM
relational skeleton ?
probability distribution over completions I
152Kinds of structural uncertainty
- How many objects does an object relate to?
- how many Authors does Paper1 have?
- Which object is an object related to?
- does Paper1 cite Paper2 or Paper3?
- Which class does an object belong to?
- is Paper1 a JournalArticle or a ConferencePaper?
- Does an object actually exist?
- Are two objects identical?
153Structural Uncertainty
- Motivation PRM with AU only well-defined when
the skeleton structure is known - May be uncertain about relational structure
itself - Construct probabilistic models of relational
structure that capture structural uncertainty - Mechanisms
- Reference uncertainty
- Existence uncertainty
- Number uncertainty
- Type uncertainty
- Identity uncertainty
154Citation Relational Schema
Author
Institution
Research Area
Wrote
Paper
Paper
Topic
Topic
Word1
Word1
Word2
Cites
Word2
Citing Paper
WordN
Cited Paper
WordN
155Attribute Uncertainty
Author
Institution
P( Institution Research Area)
Research Area
Wrote
P( Topic Paper.Author.Research Area
Paper
Topic
P( WordN Topic)
...
Word1
WordN
156Reference Uncertainty
Bibliography
1. ----- 2. ----- 3. -----
Scientific Paper
Document Collection
157PRM w/ Reference Uncertainty
Paper
Paper
Topic
Topic
Cites
Words
Words
Citing
Cited
Dependency model for foreign keys
- Naïve Approach multinomial over primary key
- noncompact
- limits ability to generalize
158Reference Uncertainty Example
Paper P5 Topic AI
Paper P4 Topic AI
Paper P3 Topic AI
Paper M2 Topic AI
Paper P5 Topic AI
C1
Paper P4 Topic Theory
Paper P1 Topic Theory
Paper P2 Topic Theory
Paper P1 Topic Theory
Paper P3 Topic AI
C2
Paper.Topic AI
Paper.Topic Theory
Cites
Citing
Cited
159Reference Uncertainty Example
Paper P5 Topic AI
Paper P4 Topic AI
Paper P3 Topic AI
Paper M2 Topic AI
Paper P5 Topic AI
C1
Paper P4 Topic Theory
Paper P1 Topic Theory
Paper P2 Topic Theory
Paper P6 Topic Theory
Paper P3 Topic AI
C2
Paper.Topic AI
Paper.Topic Theory
C1
C2
Topic
Cites
Theory
Citing
AI
Cited
160Introduce Selector RVs
P2.Topic
Cites1.Selector
P3.Topic
Cites1.Cited
P1.Topic
P4.Topic
Cites2.Selector
P5.Topic
Cites2.Cited
P6.Topic
Introduce Selector RV, whose domain is
C1,C2 The distribution over Cited depends on
all of the topics, and the selector
161PRMs w/ RU Semantics
Paper
Paper
Topic
Topic
Cites
Words
Words
Cited
Citing
PRM RU
162Learning PRMs w/ RU
- Idea just like in PRMs w/ AU
- define scoring function
- do greedy local structure search
- Issues
- expanded search space
- construct partitions
- new operators
163Learning
PRMs w/ RU
- Idea
- define scoring function
- do phased local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
model new dependencies
unchanged
new operators
164Legal Models
Review
Mood
Paper
Paper
Important
Important
Accepted
Cites
Accepted
Citing
Cited
165Legal Models
Cites1.Selector
Cites1.Cited
P2.Important
R1.Mood
P3.Important
P1.Accepted
P4.Important
When a nodes parent is defined using an
uncertain relation, the reference RV must be a
parent of the node as well.
166Structure Search
Cites
Author
Citing
Institution
Cited
Cited
167Structure Search New Operators
Cites
Author
Citing
Institution
Cited
Refine on Topic
Cited
?score
Paper
Paper
Paper
Paper
Paper
168Structure Search New Operators
Cites
Author
Citing
Institution
Cited
Refine on Topic
Cited
Paper
Paper
Paper
Paper
Paper
Refine on Author.Instition
?score
Paper
Paper
Paper
169PRMs w/ RU Summary
- Define semantics for uncertainty over which
entities are related to each other - Search now includes operators Refine and Abstract
for constructing foreign-key dependency model - Provides one simple mechanism for link
uncertainty
170Existence Uncertainty
Document Collection
Document Collection
171PRM w/ Exists Uncertainty
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
Dependency model for existence of relationship
172Exists Uncertainty Example
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
False
True
Cited.Topic
Citer.Topic
173Introduce Exists RVs
Author 2
Author 1
Inst
Area
Inst
Area
Paper3
Paper1
Paper2
Topic
Topic
Topic
WordN
WordN
Word1
Word1
Word1
...
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
1-3
3-2
174Introduce Exists RVs
Author 2
Author 1
Inst
Area
Inst
Area
Paper3
Paper1
Paper2
Topic
Topic
Topic
WordN
WordN
Word1
Word1
Word1
...
...
...
WordN
Exists
Exists
Exists
Exists
Exists
Exists
1-2
2-3
2-1
3-1
1-3
3-2
175PRMs w/ EU Semantics
Paper
Paper
Topic
Topic
Cites
Words
Words
Exists
PRM EU
176Learning PRMs w/ EU
- Idea just like in PRMs w/ AU
- define scoring function
- do greedy local structure search
- Issues
- efficiency
- Computation of sufficient statistics for exists
attribute - Do not explicitly consider relations that do not
exist
177 Structure Selection
PRMs w/ EU
- Idea
- define scoring function
- do phased local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
model new dependencies
unchanged
unchanged
178Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- PRMs w/ Attribute Uncertainty
- Inference in PRMs
- Learning in PRMs
- PRMs w/ Structural Uncertainty
- PRMs w/ Class Hierarchies
- Undirected Relational Models
- Programming Language Approaches
179PRMs with classes
- Relations organized in a class hierarchy
- Subclasses inherit their probability model from
superclasses - Instances are a special case of subclasses of
size 1 - As you descend through the class hierarchy, you
can have richer dependency models - e.g. cannot say Accepted(P1) lt- Accepted(P2)
(cyclic) - but can say Accepted(JournalP1) lt-
Accepted(ConfP2)
Venue
Journal
Conference
180Type Uncertainty
- Is 1st-Venue a Journal or Conference ?
- Create 1st-Journal and 1st-Conference objects
- Introduce Type(1st-Venue) variable with possible
values Journal and Conference - Make 1st-Venue equal to 1st-Journal or
1st-Conference according to value of
Type(1st-Venue)
181Learning PRM-CHs
Vote
Database
Person
Instance I
PRM-CH
Vote
TVProgram
Person
Relational Schema
182 Structure Selection
PRMs w/ CH
- Idea
- define scoring function
- do phased local search over legal structures
- Key Components
- legal models
- scoring models
- searching model space
model new dependencies
unchanged
new operators
183Guaranteeing Acyclicity with Subclasses
184Learning PRM-CH
- Scenario 1 Class hierarchy is provided
- New Operators
- Specialize/Inherit
AcceptedPaper
AcceptedJournal
AcceptedConference
AcceptedWorkshop
185Learning Class Hierarchy
- Issue partially observable data set
- Construct decision tree for class defined over
attributes observed in training set
186PRMs w/ Class Hierarchies
- Allows us to
- Refine a heterogenous class into more coherent
subclasses - Refine probabilistic model along class hierarchy
- Can specialize/inherit CPDs
- Construct new dependencies that were originally
acyclic
Provides bridge from class-based model to
instance-based model
187PRM-CH Summary
- PRMs with class hierarchies are a natural
extension of PRMs - Specialization/Inheritance of CPDs
- Allows new dependency structures
- Provide bridge from class-based to instance-based
models - Learning techniques proposed
- Need efficient heuristics
- Empirical validation on real-world domains
188Summary Frame-based Approaches
- Focus on objects and relationships
- what types of objects are there, and how are they
related to each other? - how does a property of an object depend on other
properties (of the same or other objects)? - Representation support
- Attribute uncertainty
- Structural uncertainty
- Class Hierarchies
- Efficient Inference and Learning Algorithms
189Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- Undirected Relational Approaches
- Background Markov Networks
- Frame-based Approaches
- Rule-based Approaches
- Programming Language Approaches
190Markov Networks
Author1 Fame
Author2 Fame
Author3 Fame
Author4 Fame
nodes domain variables edges mutual influence
Network structure encodes conditional
independencies I(A1 Fame, A4 Fame A2
Fame, A3 Fame)
191Markov Network Semantics
F1
F2
conditional independencies in MN structure
local clique potentials
full joint distribution over domain
F3
F4
where Z is a normalizing factor that ensures that
the probabilities sum to 1
Good news no acyclicity constraints Bad news
global normalization (1/Z)
192Roadmap
- Motivation
- Background
- Rule-based Approaches
- Frame-based Approaches
- Undirected Relational Approaches
- Background Markov Networks
- Frame-based Approaches
- Markov Relational Networks Taskar, Segal
Koller 01, Taskar, Abbeel Koller 02, Taskar,
Guestrin, Koller Taskar 04 - Rule-based Approaches
- Programming Language Approaches
193Advantages of Undirected Models
- Symmetric, non-causal interactions
- Web categories of linked pages are correlated
- Social nets individual correlated with peers
- Cannot introduce direct edges because
- of cycles
- Patterns involving multiple entities
- Web triangle patterns
- Social nets transitive relations
194Relational Markov Networks
- Locality
- Local probabilistic dependencies given by
relational links - Universals
- Same dependencies hold for all objects linked in
a particular pattern
Template potential
Author1
Paper1