Pachinko Allocation: DAGStructured Mixture Models of Topic Correlations PowerPoint PPT Presentation

presentation player overlay
1 / 38
About This Presentation
Transcript and Presenter's Notes

Title: Pachinko Allocation: DAGStructured Mixture Models of Topic Correlations


1
Pachinko Allocation DAG-Structured Mixture
Models of Topic Correlations
  • Wei Li
  • Andrew McCallum
  • Computer Science Department
  • University of Massachusetts Amherst

With thanks to David Blei ,Yee Whye Teh, Sam
Roweis for helpful discussion, and thanks to
Michael Jordan for help in naming the model.
2
Statistical Topic Models
  • Discover a low-dimensional set of topics that
    summarize concepts in text collections
  • Non-textual data images and biological findings

3
Latent Dirichlet Allocation
Blei, Ng, Jordan, 2003
a
N
topic distribution
?
n
z
topic
ß
T
w
f
word
Per-topic multinomial over words
4
Correlated Topic Model
Blei, Lafferty, 2005
?
?
N
logistic normal
?
n
z
ß
T
w
f
Square matrix of pairwise correlations.
5
Topic Correlation Representation
7 topics A, B, C, D, E, F, G Correlations
A, B, C, D, E and C, D, E, F, G
CTM
B
C
D
E
F
G
A
B
C
D
E
F
6
Pachinko Machine
7
Pachinko Allocation Model (PAM)
Thanks to Michael Jordan for suggesting the name
Li, McCallum, 2006
?11
Model structure directed acyclic graph (DAG)
at each interior node a Dirichlet over its
children and words at leaves
Model structure, not the graphical model
?22
?21
For each document Sample a multinomial from
each Dirichlet
?31
?33
?32
For each word in this document Starting from
the root, sample a child from successive
nodes, down to a leaf. Generate the word at the
leaf
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
Like a Polya tree, but DAG shaped, with arbitrary
number of children.
8
Pachinko Allocation Model
Li, McCallum, 2006
?11
  • DAG may have arbitrary structure
  • arbitrary depth
  • any number of children per node
  • sparse connectivity
  • edges may skip layers

Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
9
Pachinko Allocation Model
Li, McCallum, 2006
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
10
Pachinko Allocation Model
Li, McCallum, 2006
?11
Estimate all these Dirichlets from
data. Estimate model structure from data.
(number of nodes, and connectivity)
Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
11
Related Models
  • Latent Dirichlet Allocation
  • Correlated Topic Model
  • Hierarchical LDA
  • Hierarchical Dirichlet Processes

12
Pachinko Allocation Special Cases
Latent Dirichlet Allocation
?11
?21
?22
?23
?24
?25
word1
word2
word3
word4
word5
word6
word7
word8
13
Hierarchical LDA
CS, AI, NLP
CS
CS, AI, Robotics
AI
CS, AI, NLP, Robotics
NLP
Robotics
14
Pachinko Allocation Special Cases
Hierarchical Latent Dirichlet Allocation (HLDA)
Very low variance Dirichlet at root
?11
Each leaf of the HLDA topic hier. has a distr.
over nodes on path to the root.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
15
Pachinko Allocation on a Topic Hierarchy
Combining best of HLDA and Pachinko Allocation
?00
ThePAMDAG.
?11
?12
...representingcorrelations amongtopic leaves.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
16
Correlated Topic Model
  • CTM captures pairwise correlations.
  • The number of parameters in CTM grows as the
    square of the number of topics.

17
Hierarchical Dirichlet Processes
  • HDP can be used to automatically determine the
    number of topics.
  • HDP captures topic correlations only when the
    data is pre-organized into nested groups.
  • HDP does not learn the topic hierarchy.

18
PAM - Notation
  • V x1, , xv word vocabulary
  • T t1, , ts topics
  • r root
  • gi(?i) Dirichlet distribution associated with
    topic ti

19
PAM - Generative Process
  • To generate a document
  • For each topic ti, sample a multinomial
    distribution ?i from gi(?i).
  • For each word w in the document
  • Sample a topic path zw based on the multinomials,
    starting from the root.
  • Sample the word from the last topic.

20
PAM - Likelihood
  • Joint probability of d, z(d) and ?(d)
  • Marginal probability of

21
Four-level PAM
... with two layers, no skipping
layers,fully-connected from one layer to the
next.
?11
?21
?23
?22
super-topics
sub-topics
?31
?32
?33
?34
?35
fixed multinomials
word1
word2
word3
word4
word5
word6
word7
word8
22
Graphical Models
Four-level PAM (with fixed multinomials for
sub-topics)
LDA
T
a1
a
a2
N
N
?2
?
?3
n
n
z2
z3
z
ß
ß
T
T
w
f
w
f
23
Inference Gibbs Sampling
T
a2
a3
N
?2
?3
n
Jointly sampled
z2
z3
ß
T
w
f
Dirichlet parameters a are estimated with moment
matching
24
Experimental Results
  • Topic clarity by human judgement
  • Likelihood on held-out data
  • Document classification

25
Datasets
  • Rexa (http//rexa.info/)
  • 4000 documents, 278438 word tokens and 25597
    unique words.
  • NIPS
  • 1647 documents, 114142 word tokens and 11708
    unique words.
  • 20 newsgroup comp5 subset
  • 4836 documents, 35567 unique words.

26
Example Topics
images, motion eyes
motion (some generic)
motion
eyes
images
LDA 100 motion detection field optical flow sensit
ive moving functional detect contrast light dimens
ional intensity computer mt measures occlusion tem
poral edge real
PAM 100 motion video surface surfaces figure scene
camera noisy sequence activation generated analy
tical pixels measurements assigne advance lated sh
own closed perceptual
LDA 20 visual model motion field object image ima
ges objects fields receptive eye position spatial
direction target vision multiple figure orientatio
n location
PAM 100 eye head vor vestibulo oculomotor vestibul
ar vary reflex vi pan rapid semicircular canals re
sponds streams cholinergic rotation topographicall
y detectors ning
PAM 100 image digit faces pixel surface interpolat
ion scene people viewing neighboring sensors patch
es manifold dataset magnitude transparency rich dy
namical amounts tor
27
Blind Topic Evaluation
  • Randomly select 25 similar pairs of topics
    generated from PAM and LDA
  • 5 people
  • Each asked to select the topic in each pair that
    you find more semantically coherent.

Topic counts
28
Examples
5 votes 0 vote 4
votes 1 vote
29
Examples
4 votes 1 vote 1
vote 4 votes
30
Topic Correlations
31
Likelihood Comparison
  • Dataset NIPS
  • Two experiments
  • Varying number of topics
  • Different proportions of training data

32
Likelihood Comparison
  • Varying number of topics

33
Likelihood Comparison
  • Different proportions of training data

34
Likelihood Estimation
  • VariationalPerform inference in a simpler model
  • (Gibbs sampling) Harmonic mean
  • Approximate the marginal probability with the
    harmonic mean of conditional probabilities
  • (Gibbs sampling) Empirical likelihood
  • Estimate the distribution based on empirical
    samples

35
Empirical Likelihood Estimation






36
Document Classification
  • 20 newsgroup comp5 subset
  • 5-way classification (accuracy in )

Statistically significant with a p-value
37
Conclusion and Future Work
  • Pachinko Allocation provides the flexibility to
    capture arbitrary nested mixtures of topic
    correlations.
  • More applications
  • More advanced DAG structures
  • Nonparametric PAM with nested HDP

38
Non-parametric PAM
?1
ß1
f1
?0
ß1i
?0
 
 
a1
 
ß0
H
p1i
p0
inf
z2
z3
x
?
d
inf
N
Write a Comment
User Comments (0)
About PowerShow.com