Title: Multiscale Topic Tomography
1Multiscale Topic Tomography
- Ramesh Nallapati, William Cohen,
- Susan Ditmore, John Lafferty
-
- Kin Ung
- (Johnson and Johnson Group)
2Introduction
- Explosive growth of electronic document
collections - Need for unsupervised techniques for
summarization, visualization and analysis - Many probabilistic graphical models proposed in
the recent past - Latent Dirichlet Allocation
- Correlated Topic Models
- Pachinko Allocation
- Dirichlet Process Mixtures
- ..
- All the above ignore an important dimension that
reveals huge amount of information - Time!
3Introduction
- Recent work that models time
- Topics over Time Wang and McCallum, KDD06
- Key ideas
- Each sampled topic generates a word as well as a
time stamp - Beta distribution to model the occurrence
probability of topics - Collapsed Gibbs sampling for inference
4Introduction
- Recent work that models time
- Topics over Time (ToT) Wang and McCallum, KDD06
5Introduction
- Recent models proposed to address this issue
- Dynamic Topic Models (DTM) Blei and Lafferty,
ICML06
- Key ideas
- Models evolution of topic content, not just
topic occurrence - Evolution of topic multinomials modeled using
logistic-normal prior - approximate variational inference
6Introduction
- Recent models proposed to address this issue
- Dynamic Topic Models (DTM) Blei and Lafferty,
ICML06
7Introduction
- Issues with DTM
- Logistic normal not a conjugate to the
multinomial - Results in complicated inference procedures
- Topic tomography a new time series topic model
- Uses a Poisson process to model word counts
- A wedding of multiscale wavelet analysis with
topic models - Uses conjugate priors
- Efficient inference
- Allows Visualization of topic evolution at
various time-scales
8Topic Tomography A sneak-preview
9Topic Tomography (TT) whats with the name?
- From the Greek words " tomos" (to cut or
section) and "graphein" (to write)
- LDA models how topics are distributed in each
document - Normalization is per document
- TT models how each topic is distributed among
documents ! - Normalization is per topic
10Topic Tomography model
11Multiscale parameter generation
scale
Haar multiscale wavelet representation
epochs
12Multiscale parameter generation
13Multiscale Topic Tomographywhere is the
conjugacy?
- Recall multiscale canonical parameters are
generated using Beta distribution - Data likelihood w.r.t. the Poissons can be
equivalently expressed in terms of the binomials -
14Multiscale Topic Tomography
- Parameter learning using mean-field variational
EM
15Experiments
- Perplexity analysis on Science data
- Spans 120 years split into 8 epochs each
spanning 15 years - Documents in each epoch split into 50-50 training
and test sets - Trained three different versions of TT
- Basic TT basic tomography model with no
multiscale analysis, applied to the whole
training set - Multiple TT same as above, but one model for
each epoch - Multiscale TT full multiscale version
16Experiments
Perplexity results
Multiple TT
Multiscale TT
LDA
Basic TT
17Experiments Topic visualization of Particle
physics
18ExperimentsTopic visualization Particle
physics
19Experiments Evolution of content-bearing words
in particle physics
electron
heat
atom
quantum
20ExperimentsTopic occurrence distribution
Genetics
Neuroscience
Climate change
Agricultural science
21Conclusion
- Advantages
- Multiscale tomography has the best features of
both DTM and ToT - In addition, it provides a zoom feature for
time-scales - A natural model for sequence modeling of counts
data - Conjugate priors, easier inference
- Limitations
- Cannot generate one document at a time
- Not easily parallelizable
- Future work
- Build a GaP like model with Gamma weights
22Demo
- Analysis of 32,000 documents from PubMed
containing the word cancer, spanning 32 years - Will be shown this evening at poster 9
- Also available at
- http//www.cs.cmu.edu/nmramesh/cancer_demo/multis
cale_home.html - Local copy
23Inference Mean field variational EM
Variational multinomial
Variational Dirichlet
24Related Work
- Poisson distribution used in 2-Poisson model in
IR - Not successful, but inspired the famous BM25
- Gamma-Poisson topic model Canny, SIGIR04
- Poisson to model word counts and Gamma to model
topic weights - does not follow the semantics of a pure
generative model - Optimizes the likelihood of complete-data
- Topic tomography model is very similar
- We optimize the likelihood of observed-data
- Use Dirichlet to model topic weights
25Related Work
- Multiscale Topic Tomography model originally
introduced by Nowak et al Nowak and Kolaczyk,
IEEE ToIT00 - Called Poisson inverse problem
- Applied to model gamma ray bursts
- Topic weights assumed to be known
- a simple EM algorithm proposed
- We cast topic modeling as a Poisson inverse
problem - Topic weights unknown
- Variational EM proposed
26Outline
- Introduction/Motivation
- Related work
- Topic Tomography model
- Basic model
- Multiscale analysis
- Learning and Inference
- Experiments
- Perplexity analysis
- Topic visualizations
- Demo (if time permits)
27ExperimentsMultiple senses of word reaction
Total count
chemistry
particle physics
Blood tests