6.899 Relational Data Learning - PowerPoint PPT Presentation

About This Presentation

Title:

6.899 Relational Data Learning

Description:

Backtrack one step to the most recent choice point in the SLD-tree (i.e., the probability tree) ... step to the next choice point with a predefined backtrack ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 20

Provided by: dis6

Learn more at: https://vismod.media.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: 6.899 Relational Data Learning

1
6.899 Relational Data Learning

Yuan Qi
MIT Media Lab
yuanqi_at_media.mit.edu
May 7, 2002

2
Outline

Structure Learning Using Stochastic Logic
Programming (SLP)
Text Classification Using Probabilistic
Relational Models (PRM)

3
Part 1Structure Learning Using SLP

SLP defines prior over BN structures
MCMC sampling BN structures
New Sampling Method

4
An SLP Defining prior BN structures

bn(,,).
bn(RVRVs,BN,AncBN)- bn(RVs, BN2, AncBN2),
connect_no_cycles(RV,BN2,AncBN2,BN,AncBN).
An edge RV parent of H
1/3 which_edge(HT,RV,H-RVRest)-
choose_edges(T,RV,Rest).
An edge H parent of RV
1/3 which_edge(HT,RV,RV-HRest) -
choose_edges(T,RV,Rest).
No edge
1/3 which_edge(_HT,RV,Rest) -
choose_edges(T,RV,Rest).

5
Metropolis-Hasting Sampling

p(T) specifies a tree prior for BN structures.
Sampling T from the transition distribution
q(Ti,T).
Set Ti T with the acceptance ratio
else set Ti1 Ti.

6
The Transition Kernel (1)

The transition kernel can be implemented by
generating a new derivation(yielding a new model
M) from the derivation which yields the current
model Mi. To be specific, we have
Backtrack one step to the most recent choice
point in the SLD-tree (i.e., the probability
tree)
If at the top of the tree, stop. Otherwise,
backtrack one more step to the next choice point
with a predefined backtrack probability pb.

7
The Transition Kernel (2)

Once stopped backtracking, choose a new leaf M
from the choice point by selecting branches
according to their probabilities attached to them
(loglinear sampling). However, we may not choose
the branch that leads back to Mi.

8
Sampling Problems

Inefficiency of the previous Metropolis-Hasting
sampling. pb 0.8, Acceptance ratio 4.
lf pb is small, slow movement of the samples,
higher acceptance ratio
lf pb is large, large movement of the samples,
lower acceptance ratio
Fixed pb the balance between local jumps to
neighboring models and big jumps to distant ones.
An Improvement Cyclic transition kernel pb
1-2-n for n 1,.28.

9
Adaptive Sampling Strategy Re-Try the Proposals

Suppose a proposal T1 from the proposal
distribution q1(T, T1) is tried and rejected.
The rejection suggests that this proposal
distribution may not be good and a different
proposal could be tried. Suppose a new sample T2
is drawn from a new proposal q2(T, T1 , T2).
But how to get a valid Markov sampling chain?

10
Adaptive Sampling Strategy New Acceptance Ratio

If we use the following acceptance ratio

then we have a valid MCMC sampler for the target
distribution, that is, the posterior of BN
structures.
11
Part 1 Conclusion

To adaptively sample BN structures, we can start
with large backtrack probability pb, and if get
rejected samples, we reduce pb and draw a
sampled structure using the new backtrack
probability. This process can be repeated.
Adaptive proposal distribution allows the SLP
sampler to locally tune its parameter to achieve
a good balance between local jumps to neighboring
models and big jumps to distant ones. Therefore,
we expect a much more efficient sampling result.

12
Part 2 Text Classification Using Probabilistic
Relational Models (PRM)

Why using PRMs?
SLP Discrete R.V.s
PRM Discrete and Continuous R.V.s
Why relational modeling of text?
Author relation
Citation relation

13
Modeling Relational Text Data

Figure 1.PRM modeling of text. By Taskar, Segal,
and Koller

Unrolled Bayesian Network
14
Transduction Train and Testing together

The test data are also included in the model
Transduction EM Algorithm
E step Belief propagation
M step Maximum Likelihood Re-estimation

15
Several Problems of Modeling in Figure 1

Naïve Bayes (Independence) assumption on
generating words
Wrong edge direction between words and topic
nodes
Wrong edge direction between a paper and its
citations.

16
Drawback of EM training and Transductions

High dimensional data, relatively limited
training points
Transduction helps training, but is very
expensive for testing, since we need retraining
the whole model for a new data point .

17
New Modeling and Bayesian Training

The new node, h, models a classifier which takes
input from words, aggregated citation and
aggregated author.

18
Training the new PRM

Unrolling this new PRM, we get a Bayesian network
modeling the text data.
Training Extension of belief propagation,
expectation propagation.
We can also easily incorporate the kernel trick
like in SVM or Gaussian processes into the
classifier h. Note that h models the conditional
relation between the text class and words,
citations, and authors.

19
Part 2 Conclusion

Benefit of the new approach
No overfitting like ML approaches
Choice of using transduction or not.
Much more powerful classifier, Bayesian Point
Machine with Kernel Expansion, compared to Naïve
Bayes method
Better Relation modeling

Write a Comment

User Comments (0)