Title: Learning First-Order Probabilistic Models with Combining Rules
1Learning First-Order Probabilistic Models with
Combining Rules
- Sriraam Natarajan
- Prasad Tadepalli
- Eric Altendorf
- Thomas G. Dietterich
- Alan Fern
- Angelo Restificar
- School of EECS
- Oregon State University
2First-order Probabilistic Models
- Combine the expressiveness of first-order logic
with the uncertainty modeling of the graphical
models - Several formalisms already exist
- Probabilistic Relational Models (PRMs)
- Bayesian Logic Programs (BLPs)
- Stochastic Logic Programs (SLPs)
- Relational Bayesian Networks (RBNs)
- Probabilistic Logic Programs (PLPs),
- Parameter sharing and quantification allow
compact representation - The projects difficulty and the project
teams competence influence the projects
success.
3First-order Probabilistic Models
- Combine the expressiveness of first-order logic
with the uncertainty modeling of the graphical
models - Several formalisms already exist
- Probabilistic Relational Models (PRMs)
- Bayesian Logic Programs (BLPs)
- Stochastic Logic Programs (SLPs)
- Relational Bayesian Networks (RBNs)
- Probabilistic Logic Programs (PLPs),
- Parameter sharing and quantification allow
compact representation
The
4Multiple Parents Problem
- Often multiple objects are related to an object
by the same relationship - Ones friends drinking habits influence ones
own - A students GPA depends on the grades in the
courses he takes - The size of a mosquito population depends on the
temperature and the rainfall each day since the
last freeze - The target variable in each of these statements
- has multiple influents (parents in Bayes
net jargon)
5Multiple Parents for population
- Variable number of parents
- Large number of parents
- Need for compact parameterization
6Solution 1 Aggregators
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Deterministic
AverageRain
AverageTemp
Stochastic
Population
Problem Does not take into account the
interaction between related parents Rain and Temp
7Solution 2 Combining Rules
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Population3
Population1
Population2
Population
- Top 3 distributions share parameters
- The 3 distributions are combined into one final
distribution
8Outline
- First-order Conditional Influence Language
- Learning the parameters of Combining Rules
- Experiments and Results
9- First Order Conditional Influence Language
- Learning the parameters of Combining Rules
- Experiments and Results
10First-order Conditional Influence Language (FOCIL)
- Task and role of a document influence its folder
- if task(t), doc(d), role(d,r,t) then
r.id, t.id Qinf d.folder. - The folder of the source of the document
influences the folder of the document - if doc(d1), doc(d2), source(d1,d2) then
d1.folder Qinf d2.folder - The difficulty of the course and the intelligence
of the student influence his/her GPA - if (student(s), course(c), takes(s,c)) then
s.IQ, c.difficulty Qinf s.gpa)
11Relationship to Other Formalisms
- Shares many of the same properties as other
statistical relational models. - Generalizes path expressions in probabilistic
relational - models to arbitrary conjunctions of literals.
- Unlike BLPs, explicitly distinguishes between
conditions, which do not allow uncertainty, and
influents, which do. - Monotonicity relationships can be specified.
- if person(p) then p.age Q
p.height -
12Combining Multiple Instances of a Single Statement
If task(t), doc(d), role(d,r,t) then
t.id, r.id Qinf (Mean) d.folder
t1.id
r1.id
t2.id
r2.id
d.folder
d.folder
Mean
d.folder
13A Different FOCIL Statement for the Same Target
Variable
If doc(s), doc(d), source(s,d) then
s.folder Qinf (Mean) d.folder
s1.folder
s2.folder
d.folder
d.folder
Mean
d.folder
14Combining Multiple Statements
- Weighted Mean
- If task(t), doc(d), role(d,r,t) then
- t.id, r.id Qinf (Mean)
d.folder - If doc(s), doc(d), source(s,d) then
- s.folder Qinf (Mean) d.folder
-
15Unrolled Network for Folder Prediction
t1.id
r1.id
t2.id
r2.id
s2.folder
s1.folder
d.folder
d.folder
d.folder
d.folder
Mean
Mean
d.folder
d.folder
Weighted Mean
d.folder
16- First Order Conditional Influence Language
- Learning the parameters of Combining Rules
- Experiments and Results
17General Unrolled Network
X2m2,k
X2m2,k
X1m1,k
X11,1
X11,k
X12,1
X12,k
X1m1,k
X21,1
X21,k
X22,1
X22,k
m1
m2
1
2
1
2
Mean
Mean
Rule1
Rule2
Y
Weighted mean
18Gradient Descent for Squared Error
where
19Gradient Descent for Loglikelihood
, where
20Learning the weights
-
- Mean Squared Error
- Loglikelihood
21Expectation-Maximization
X2m2,k
X2m2,k
X1m1,k
X11,1
X11,k
X1m1,k
X21,1
X21,k
?1m1
?21
?11
?2m2
m1
m2
1
1
1/m1
1/m1
1/m2
1/m2
Mean
Mean
w2
w1
Weighted mean
Y
22EM learning
- Expectation-step Compute the responsibilities of
each instance of each rule - Maximization-step Compute the maximum likelihood
parameters using responsibilities as the counts -
where n is the of examples with 2 or more rules
instantiated
23- First Order Conditional Influence Language
- Learning the parameters of Combining Rules
- Experiments and Results
24Experimental Setup
Weighted Mean If task(t), doc(d), role(d,r,t)
then t.id, r.id Qinf (Mean) d.folder. If doc(s),
doc(d), source(s,d) then s.folder Qinf (Mean)
d.folder.
- 500 documents, 6 tasks, 2 roles, 11 folders
- Each document typically has 1-2 task-role pairs
- 25 of documents have a source folder
- 10-fold cross validation
25Folder prediction task
- Mean reciprocal rank
-
- where ni is the number of times the true
folder was ranked as i - Propositional classifiers
- Decision trees and Naïve Bayes
- Features are the number of occurrences of each
task-role pair and source document folder
26Rank EM GD- MS GD-LL J48 NB
1 349 354 346 351 326
2 107 98 113 100 110
3 22 26 18 28 34
4 15 12 15 6 19
5 6 4 4 6 4
6 0 0 3 0 0
7 1 4 1 2 0
8 0 2 0 0 1
9 0 0 0 6 1
10 0 0 0 0 0
11 0 0 0 0 5
MRR 0.8299 0.8325 0.8274 0.8279 0.797
27Learning the weights
- Original dataset 2nd rule has more weight ) it
is more predictive when both rules are applicable
- Modified dataset The folder names of all the
sources were randomized ) 2nd rule is made
ineffective ) weight of - the 2nd rule decreases
EM GD-MS GD-LL
Original data set Weights h.15,.85i h.22,.78i h.05,.95i
Original data set Score .8299 .8325 .8274
Modified data set Weights h.9,.1i h.84,.16i h1,0i
Modified data set Score .7934 .8021 .7939
28Lessons from Real-world Data
- The propositional learners are almost as good as
the first-order learners in this domain! - The number of parents is 1-2 in this domain
- About ¾ of the time only one rule is applicable
- Ranking of probabilities is easy in this case
- Accurate modeling of the probabilities is needed
- Making predictions that combine with other
predictions - Cost-sensitive decision making
29Synthetic Data Set
- 2 rules with 2 inputs each Wrule1 0.1,Wrule2
0.9 - Probability that an example matches a rule .5
- If an example matches a rule, the number of
instances is 3 - 10 - Performance metric average absolute error in
predicted probability
30Synthetic Data Set - Results
31Synthetic Data Set GDMS
32Synthetic Data Set GDLL
33Synthetic Data Set EM
34Conclusions
- Introduced a general instance of multiple parents
problem in first-order probabilistic languages - Gradient descent and EM successfully learn the
parameters of the conditional distributions as
well as the parameters of the combining rules
(weights) - First order methods significantly outperform
propositional methods in modeling the
distributions when the number of parents 3
35Future Work
- We plan to extend these results to more general
classes of combining rules - Develop efficient inference algorithms with
combining rules - Develop compelling applications
- Combining rules and aggregators
- Can they both be understood as instances of
causal independence?