Title: Style
1Style Topic Language Model Adaptation Using
HMM-LDA
- Bo-June (Paul) Hsu, James Glass
2Outline
- Introduction
- LDA
- HMM-LDA
- Experiments
- Conclusions
3Introduction
- An effective LM needs to not only account for the
casual speaking style of lectures but also
accommodate the topic-specific vocabulary of the
subject matter - Available training corpora rarely match the
target lecture in both style and topic - In this paper, the syntactic state and semantic
topic assignment are investigated using HMM with
LDA model
4LDA
- A generative probabilistic model of a corpus
- The topic mixture is drawn from a conjugate
Dirichlet prior - PLSA
- LDA
- Model parameters
5Markov chain Monte Carlo
- A class of algorithms for sampling from
probability distributions based on constructing a
Markov chain that has the desired distribution as
its stationary distribution - The most common application of these algorithms
is numerically calculating multi-dimensional
integrals - an ensemble of "walkers" moves around randomly
- A Markov chain is constructed in such a way as to
have the integrand as its equilibrium
distribution
6LDA
- Estimate posteriori
- Integrating out
- Gibbs sampling
7Markov chain Monte Carlo (cont.)
- Gibbs Sampling http//en.wikipedia.org/wiki/Gibbs_
sampling
8HMMLDA
- HMMs generate documents purely based on syntactic
relations among unobserved word classes - Short-range dependencies
- Topic model generate documents based on semantic
correlations between words, independent of word
order - long-range dependencies
- A major advantage of generative models is
modularity - Different models are easily combined
- Words are exhibited by Mixture of model product
of model - Only a subset of words, content words, exhibit
long-range dependencies - Replace one probability distribution over words
used in syntactic model with the semantic model
9HMMLDA (cont.)
- Notation
- A sequence of words
- A sequence of topic assignments
- A sequence of classes
- means semantic class
- zth topic associated with distribution over words
- Each class is associated with distribution over
words - Each document has a distribution over topic
- Transition between class and
follows a distribution
10HMMLDA (cont.)
- A document is generated
- Sample from a prior
- For each word in document
- Draw from
- Draw from
- If , then draw from ,else draw
from
11HMMLDA (cont.)
- Inference
- are drawn from
- are drawn from
- The row of the transition matrix are drawn from
- are drawn from
- Assume all Dirichlet distribution are symmetric
-
12HMMLDA (cont.)
13HMM-LDA Analysis
- Lectures Corpus
- 3 undergraduate subject in math, physics,
computer science - 10 CS lectures for development set, 10 CS
lectures for test set - Textbook Corpus
- CS course textbook
- divided in to 271 topic-cohesive documents at
every section heading - Run Gibbs sampler against the two dataset
- L 2,800 iterations, T 2,000 iterations
- Use lowest perplexity model as the final model
14HMM-LDA Analysis (cont.)
- Semantic topics (Lectures)
Magnetism
Machine learning
Linear Algebra
Childhood Memories
- ltlaughgt cursory examination of the data suggests
that speakers talking about children tend to
laugh more during the lecture - Although it may not be desirable to capture
speaker idiosyncrasies in the topic mixtures,
HMM-LDA has clearly demonstrated its ability to
capture distinctive semantic topics in a corpus
15HMM-LDA Analysis (cont.)
- Semantic topics (Textbook)
- A topically coherent paragraph
- 6 of the 7 instances of the words and and or
(underline) are correctly classified - Multi-word topic key phrases can be identified
for n-gram topic models
the context-dependent labeling abilities of the
HMM-LDA models is demonstrated
16HMM-LDA Analysis (cont.)
- Syntactic States (Lectures)
- State 20 is topic state
Prepositions
Conjunctions
Verbs
Hesitation disfluencies
- As demonstrated with spontaneous speech, HMM-LDA
yields syntactic states that have a good
correspondence to part-of speech labels, without
requiring any labeled training data
17Discussions
- Although MCMC techniques converge to the global
stationary distribution, we cannot guarantee
convergence from observation of the perplexity
alone - Unlike EM algorithms, random sampling may
actually temporarily decrease the model
likelihood - The number of iteration was chosen to be at least
double the point at which the PP first appeared
to converge
18Language Modeling Experiments
- Baseline model Lecture Textbook Interpolated
trigram model (using modified Kneser-Ney
discounting) - Topic-deemphasized style (trigram) model
(Lectures) - To deemphasize the observed occurrences of topic
words and ideally redistribute these counts to
all potential topic words - The counts of topic to style word transitions are
not altered
19Language Modeling Experiments (cont.)
- Textbook model should ideally have higher weight
in the contexts containing topic words - Domain trigram model (Textbook)
- Emphasize the sequences containing a topic word
in the context by doubling their counts
20Language Modeling Experiments (cont.)
- unsmoothed topical tirgram model
- Apply HMM-LDA with 100 topics to identify
representative words and their associated
contexts for each topics - Topic mixtures for all models
- Mixture weights were tuned on individual target
lectures (cheat) - 15 of 100 topics account for over 90 of the
total weight
21Language Modeling Experiments (cont.)
- Since the topic distribution shifts over a long
lecture, modeling a lecture with fixed weights
may not be the most optimal - Update the mixture distribution by linearly
interpolating it with the posterior topic
distribution given the current word
22Language Modeling Experiments (cont.)
- The variation of topic mixtures
Review previous lecture -gt Show an example of
computation using accumulators -gt Focus the
lecture on stream as a data structure, with an
intervening example that finds pairs of i and j
that sum up to a prime
23Language Modeling Experiments (cont.)
24Conclusions
- HMM-LDA shows great promise for finding structure
in unlabeled data, from which we can build more
sophisticated models - Speaker-specific adaptation will be investigated
in the future