Learning First-Order Probabilistic Models with Combining Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Learning First-Order Probabilistic Models with Combining Rules

Description:

... depends on the temperature and the rainfall each day since the last freeze ... Modified dataset : The folder names of all the sources were randomized ) 2nd ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 29
Provided by: Srir2
Category:

less

Transcript and Presenter's Notes

Title: Learning First-Order Probabilistic Models with Combining Rules


1
Learning First-Order Probabilistic Models with
Combining Rules
  • Sriraam Natarajan
  • Prasad Tadepalli
  • Eric Altendorf
  • Thomas G. Dietterich
  • Alan Fern
  • Angelo Restificar
  • School of EECS
  • Oregon State University

2
First-order Probabilistic Models
  • Combine the expressiveness of first-order logic
    with the uncertainty modeling of the graphical
    models
  • Several formalisms already exist
  • Probabilistic Relational Models (PRMs)
  • Bayesian Logic Programs (BLPs)
  • Stochastic Logic Programs (SLPs)
  • Relational Bayesian Networks (RBNs)
  • Probabilistic Logic Programs (PLPs),
  • Parameter sharing and quantification allow
    compact representation
  • The projects difficulty and the project
    teams competence influence the projects
    success.

3
First-order Probabilistic Models
  • Combine the expressiveness of first-order logic
    with the uncertainty modeling of the graphical
    models
  • Several formalisms already exist
  • Probabilistic Relational Models (PRMs)
  • Bayesian Logic Programs (BLPs)
  • Stochastic Logic Programs (SLPs)
  • Relational Bayesian Networks (RBNs)
  • Probabilistic Logic Programs (PLPs),
  • Parameter sharing and quantification allow
    compact representation

The
4
Multiple Parents Problem
  • Often multiple objects are related to an object
    by the same relationship
  • Ones friends drinking habits influence ones
    own
  • A students GPA depends on the grades in the
    courses he takes
  • The size of a mosquito population depends on the
    temperature and the rainfall each day since the
    last freeze
  • The target variable in each of these statements
  • has multiple influents (parents in Bayes
    net jargon)

5
Multiple Parents for population
  • Variable number of parents
  • Large number of parents
  • Need for compact parameterization

6
Solution 1 Aggregators
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Deterministic
AverageRain
AverageTemp
Stochastic
Population
Problem Does not take into account the
interaction between related parents Rain and Temp
7
Solution 2 Combining Rules
Rain1
Temp1
Rain2
Temp2
Rain3
Temp3
Population3
Population1
Population2
Population
  • Top 3 distributions share parameters
  • The 3 distributions are combined into one final
    distribution

8
Outline
  • First-order Conditional Influence Language
  • Learning the parameters of Combining Rules
  • Experiments and Results

9
  • First Order Conditional Influence Language
  • Learning the parameters of Combining Rules
  • Experiments and Results

10
First-order Conditional Influence Language (FOCIL)
  • Task and role of a document influence its folder
  • if task(t), doc(d), role(d,r,t) then
    r.id, t.id Qinf d.folder.
  • The folder of the source of the document
    influences the folder of the document
  • if doc(d1), doc(d2), source(d1,d2) then
    d1.folder Qinf d2.folder
  • The difficulty of the course and the intelligence
    of the student influence his/her GPA
  • if (student(s), course(c), takes(s,c)) then
    s.IQ, c.difficulty Qinf s.gpa)

11
Relationship to Other Formalisms
  • Shares many of the same properties as other
    statistical relational models.
  • Generalizes path expressions in probabilistic
    relational
  • models to arbitrary conjunctions of literals.
  • Unlike BLPs, explicitly distinguishes between
    conditions, which do not allow uncertainty, and
    influents, which do.
  • Monotonicity relationships can be specified.
  • if person(p) then p.age Q
    p.height

12
Combining Multiple Instances of a Single Statement
If task(t), doc(d), role(d,r,t) then
t.id, r.id Qinf (Mean) d.folder
t1.id
r1.id
t2.id
r2.id
d.folder
d.folder
Mean
d.folder
13
A Different FOCIL Statement for the Same Target
Variable
If doc(s), doc(d), source(s,d) then
s.folder Qinf (Mean) d.folder
s1.folder
s2.folder
d.folder
d.folder
Mean
d.folder
14
Combining Multiple Statements
  • Weighted Mean
  • If task(t), doc(d), role(d,r,t) then
  • t.id, r.id Qinf (Mean)
    d.folder
  • If doc(s), doc(d), source(s,d) then
  • s.folder Qinf (Mean) d.folder

15
Unrolled Network for Folder Prediction
t1.id
r1.id
t2.id
r2.id
s2.folder
s1.folder
d.folder
d.folder
d.folder
d.folder
Mean
Mean
d.folder
d.folder
Weighted Mean
d.folder
16
  • First Order Conditional Influence Language
  • Learning the parameters of Combining Rules
  • Experiments and Results

17
General Unrolled Network
X2m2,k


X2m2,k




X1m1,k
X11,1
X11,k
X12,1
X12,k
X1m1,k
X21,1
X21,k
X22,1
X22,k


m1
m2
1
2
1
2
Mean
Mean
Rule1
Rule2
Y
Weighted mean
18
Gradient Descent for Squared Error
  • Squared error

where
19
Gradient Descent for Loglikelihood
  • Loglikelihood

, where
20
Learning the weights
  • Mean Squared Error
  • Loglikelihood

21
Expectation-Maximization
X2m2,k


X2m2,k


X1m1,k
X11,1
X11,k
X1m1,k
X21,1
X21,k


?1m1
?21
?11
?2m2
m1
m2
1
1
1/m1
1/m1
1/m2
1/m2
Mean
Mean
w2
w1
Weighted mean
Y
22
EM learning
  • Expectation-step Compute the responsibilities of
    each instance of each rule
  • Maximization-step Compute the maximum likelihood
    parameters using responsibilities as the counts

where n is the of examples with 2 or more rules
instantiated
23
  • First Order Conditional Influence Language
  • Learning the parameters of Combining Rules
  • Experiments and Results

24
Experimental Setup
Weighted Mean If task(t), doc(d), role(d,r,t)
then t.id, r.id Qinf (Mean) d.folder. If doc(s),
doc(d), source(s,d) then s.folder Qinf (Mean)
d.folder.
  • 500 documents, 6 tasks, 2 roles, 11 folders
  • Each document typically has 1-2 task-role pairs
  • 25 of documents have a source folder
  • 10-fold cross validation

25
Folder prediction task
  • Mean reciprocal rank
  • where ni is the number of times the true
    folder was ranked as i
  • Propositional classifiers
  • Decision trees and Naïve Bayes
  • Features are the number of occurrences of each
    task-role pair and source document folder

26
Rank EM GD- MS GD-LL J48 NB
1 349 354 346 351 326
2 107 98 113 100 110
3 22 26 18 28 34
4 15 12 15 6 19
5 6 4 4 6 4
6 0 0 3 0 0
7 1 4 1 2 0
8 0 2 0 0 1
9 0 0 0 6 1
10 0 0 0 0 0
11 0 0 0 0 5
MRR 0.8299 0.8325 0.8274 0.8279 0.797
27
Learning the weights
  • Original dataset 2nd rule has more weight ) it
    is more predictive when both rules are applicable
  • Modified dataset The folder names of all the
    sources were randomized ) 2nd rule is made
    ineffective ) weight of
  • the 2nd rule decreases

EM GD-MS GD-LL
Original data set Weights h.15,.85i h.22,.78i h.05,.95i
Original data set Score .8299 .8325 .8274
Modified data set Weights h.9,.1i h.84,.16i h1,0i
Modified data set Score .7934 .8021 .7939
28
Lessons from Real-world Data
  • The propositional learners are almost as good as
    the first-order learners in this domain!
  • The number of parents is 1-2 in this domain
  • About ¾ of the time only one rule is applicable
  • Ranking of probabilities is easy in this case
  • Accurate modeling of the probabilities is needed
  • Making predictions that combine with other
    predictions
  • Cost-sensitive decision making

29
Synthetic Data Set
  • 2 rules with 2 inputs each Wrule1 0.1,Wrule2
    0.9
  • Probability that an example matches a rule .5
  • If an example matches a rule, the number of
    instances is 3 - 10
  • Performance metric average absolute error in
    predicted probability

30
Synthetic Data Set - Results
31
Synthetic Data Set GDMS
32
Synthetic Data Set GDLL
33
Synthetic Data Set EM
34
Conclusions
  • Introduced a general instance of multiple parents
    problem in first-order probabilistic languages
  • Gradient descent and EM successfully learn the
    parameters of the conditional distributions as
    well as the parameters of the combining rules
    (weights)
  • First order methods significantly outperform
    propositional methods in modeling the
    distributions when the number of parents 3

35
Future Work
  • We plan to extend these results to more general
    classes of combining rules
  • Develop efficient inference algorithms with
    combining rules
  • Develop compelling applications
  • Combining rules and aggregators
  • Can they both be understood as instances of
    causal independence?
Write a Comment
User Comments (0)
About PowerShow.com