Pachinko Allocation: DAGStructured Mixture Models of Topic Correlations presentation

About This Presentation

Transcript and Presenter's Notes

Title: Pachinko Allocation: DAGStructured Mixture Models of Topic Correlations

1
Pachinko Allocation DAG-Structured Mixture
Models of Topic Correlations

Wei Li
Andrew McCallum
Computer Science Department
University of Massachusetts Amherst

With thanks to David Blei ,Yee Whye Teh, Sam
Roweis for helpful discussion, and thanks to
Michael Jordan for help in naming the model.
2
Statistical Topic Models

Discover a low-dimensional set of topics that
summarize concepts in text collections

Non-textual data images and biological findings

3
Latent Dirichlet Allocation
Blei, Ng, Jordan, 2003
a
N
topic distribution
?
n
z
topic
ß
T
w
f
word
Per-topic multinomial over words
4
Correlated Topic Model
Blei, Lafferty, 2005
?
?
N
logistic normal
?
n
z
ß
T
w
f
Square matrix of pairwise correlations.
5
Topic Correlation Representation
7 topics A, B, C, D, E, F, G Correlations
A, B, C, D, E and C, D, E, F, G
CTM
B
C
D
E
F
G
A
B
C
D
E
F
6
Pachinko Machine
7
Pachinko Allocation Model (PAM)
Thanks to Michael Jordan for suggesting the name
Li, McCallum, 2006
?11
Model structure directed acyclic graph (DAG)
at each interior node a Dirichlet over its
children and words at leaves
Model structure, not the graphical model
?22
?21
For each document Sample a multinomial from
each Dirichlet
?31
?33
?32
For each word in this document Starting from
the root, sample a child from successive
nodes, down to a leaf. Generate the word at the
leaf
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
Like a Polya tree, but DAG shaped, with arbitrary
number of children.
8
Pachinko Allocation Model
Li, McCallum, 2006
?11

DAG may have arbitrary structure
arbitrary depth
any number of children per node
sparse connectivity
edges may skip layers

Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
9
Pachinko Allocation Model
Li, McCallum, 2006
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
10
Pachinko Allocation Model
Li, McCallum, 2006
?11
Estimate all these Dirichlets from
data. Estimate model structure from data.
(number of nodes, and connectivity)
Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
11
Related Models

Latent Dirichlet Allocation
Correlated Topic Model
Hierarchical LDA
Hierarchical Dirichlet Processes

12
Pachinko Allocation Special Cases
Latent Dirichlet Allocation
?11
?21
?22
?23
?24
?25
word1
word2
word3
word4
word5
word6
word7
word8
13
Hierarchical LDA
CS, AI, NLP
CS
CS, AI, Robotics
AI
CS, AI, NLP, Robotics
NLP
Robotics
14
Pachinko Allocation Special Cases
Hierarchical Latent Dirichlet Allocation (HLDA)
Very low variance Dirichlet at root
?11
Each leaf of the HLDA topic hier. has a distr.
over nodes on path to the root.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
15
Pachinko Allocation on a Topic Hierarchy
Combining best of HLDA and Pachinko Allocation
?00
ThePAMDAG.
?11
?12
...representingcorrelations amongtopic leaves.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
16
Correlated Topic Model

CTM captures pairwise correlations.
The number of parameters in CTM grows as the
square of the number of topics.

17
Hierarchical Dirichlet Processes

HDP can be used to automatically determine the
number of topics.
HDP captures topic correlations only when the
data is pre-organized into nested groups.
HDP does not learn the topic hierarchy.

18
PAM - Notation

V x1, , xv word vocabulary
T t1, , ts topics
r root
gi(?i) Dirichlet distribution associated with
topic ti

19
PAM - Generative Process

To generate a document
For each topic ti, sample a multinomial
distribution ?i from gi(?i).
For each word w in the document
Sample a topic path zw based on the multinomials,
starting from the root.
Sample the word from the last topic.

20
PAM - Likelihood

Joint probability of d, z(d) and ?(d)
Marginal probability of

21
Four-level PAM
... with two layers, no skipping
layers,fully-connected from one layer to the
next.
?11
?21
?23
?22
super-topics
sub-topics
?31
?32
?33
?34
?35
fixed multinomials
word1
word2
word3
word4
word5
word6
word7
word8
22
Graphical Models
Four-level PAM (with fixed multinomials for
sub-topics)
LDA
T
a1
a
a2
N
N
?2
?
?3
n
n
z2
z3
z
ß
ß
T
T
w
f
w
f
23
Inference Gibbs Sampling
T
a2
a3
N
?2
?3
n
Jointly sampled
z2
z3
ß
T
w
f
Dirichlet parameters a are estimated with moment
matching
24
Experimental Results

Topic clarity by human judgement
Likelihood on held-out data
Document classification

25
Datasets

Rexa (http//rexa.info/)
4000 documents, 278438 word tokens and 25597
unique words.
NIPS
1647 documents, 114142 word tokens and 11708
unique words.
20 newsgroup comp5 subset
4836 documents, 35567 unique words.

26
Example Topics
images, motion eyes
motion (some generic)
motion
eyes
images
LDA 100 motion detection field optical flow sensit
ive moving functional detect contrast light dimens
ional intensity computer mt measures occlusion tem
poral edge real
PAM 100 motion video surface surfaces figure scene
camera noisy sequence activation generated analy
tical pixels measurements assigne advance lated sh
own closed perceptual
LDA 20 visual model motion field object image ima
ges objects fields receptive eye position spatial
direction target vision multiple figure orientatio
n location
PAM 100 eye head vor vestibulo oculomotor vestibul
ar vary reflex vi pan rapid semicircular canals re
sponds streams cholinergic rotation topographicall
y detectors ning
PAM 100 image digit faces pixel surface interpolat
ion scene people viewing neighboring sensors patch
es manifold dataset magnitude transparency rich dy
namical amounts tor
27
Blind Topic Evaluation

Randomly select 25 similar pairs of topics
generated from PAM and LDA
5 people
Each asked to select the topic in each pair that
you find more semantically coherent.

Topic counts
28
Examples
5 votes 0 vote 4
votes 1 vote
29
Examples
4 votes 1 vote 1
vote 4 votes
30
Topic Correlations
31
Likelihood Comparison

Dataset NIPS
Two experiments
Varying number of topics
Different proportions of training data

32
Likelihood Comparison

Varying number of topics

33
Likelihood Comparison

Different proportions of training data

34
Likelihood Estimation

VariationalPerform inference in a simpler model
(Gibbs sampling) Harmonic mean
Approximate the marginal probability with the
harmonic mean of conditional probabilities
(Gibbs sampling) Empirical likelihood
Estimate the distribution based on empirical
samples

35
Empirical Likelihood Estimation

36
Document Classification

20 newsgroup comp5 subset
5-way classification (accuracy in )

Statistically significant with a p-value
37
Conclusion and Future Work

Pachinko Allocation provides the flexibility to
capture arbitrary nested mixtures of topic
correlations.
More applications
More advanced DAG structures
Nonparametric PAM with nested HDP

38
Non-parametric PAM
?1
ß1
f1
?0
ß1i
?0

a1

ß0
H
p1i
p0
inf
z2
z3
x
?
d
inf
N

Write a Comment

User Comments (0)

About PowerShow.com

Pachinko Allocation: DAGStructured Mixture Models of Topic Correlations PowerPoint PPT Presentation