Style

1 / 24

About This Presentation

Title:

Style

Description:

An effective LM needs to not only account for the casual ... In this paper, the syntactic state and semantic topic ... ensemble of 'walkers' moves around ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 25

Provided by: XUAN87

more less

Transcript and Presenter's Notes

Title: Style

1
Style Topic Language Model Adaptation Using
HMM-LDA

Bo-June (Paul) Hsu, James Glass

2
Outline

Introduction
LDA
HMM-LDA
Experiments
Conclusions

3
Introduction

An effective LM needs to not only account for the
casual speaking style of lectures but also
accommodate the topic-specific vocabulary of the
subject matter
Available training corpora rarely match the
target lecture in both style and topic
In this paper, the syntactic state and semantic
topic assignment are investigated using HMM with
LDA model

4
LDA

A generative probabilistic model of a corpus
The topic mixture is drawn from a conjugate
Dirichlet prior
PLSA
LDA
Model parameters

5
Markov chain Monte Carlo

A class of algorithms for sampling from
probability distributions based on constructing a
Markov chain that has the desired distribution as
its stationary distribution
The most common application of these algorithms
is numerically calculating multi-dimensional
integrals
an ensemble of "walkers" moves around randomly
A Markov chain is constructed in such a way as to
have the integrand as its equilibrium
distribution

6
LDA

Estimate posteriori
Integrating out
Gibbs sampling

7
Markov chain Monte Carlo (cont.)

Gibbs Sampling http//en.wikipedia.org/wiki/Gibbs_
sampling

8
HMMLDA

HMMs generate documents purely based on syntactic
relations among unobserved word classes
Short-range dependencies
Topic model generate documents based on semantic
correlations between words, independent of word
order
long-range dependencies
A major advantage of generative models is
modularity
Different models are easily combined
Words are exhibited by Mixture of model product
of model
Only a subset of words, content words, exhibit
long-range dependencies
Replace one probability distribution over words
used in syntactic model with the semantic model

9
HMMLDA (cont.)

Notation
A sequence of words
A sequence of topic assignments
A sequence of classes
means semantic class
zth topic associated with distribution over words
Each class is associated with distribution over
words
Each document has a distribution over topic
Transition between class and
follows a distribution

10
HMMLDA (cont.)

A document is generated
Sample from a prior
For each word in document
Draw from
Draw from
If , then draw from ,else draw
from

11
HMMLDA (cont.)

Inference
are drawn from
are drawn from
The row of the transition matrix are drawn from
are drawn from
Assume all Dirichlet distribution are symmetric

12
HMMLDA (cont.)

Gibbs Sampling

13
HMM-LDA Analysis

Lectures Corpus
3 undergraduate subject in math, physics,
computer science
10 CS lectures for development set, 10 CS
lectures for test set
Textbook Corpus
CS course textbook
divided in to 271 topic-cohesive documents at
every section heading
Run Gibbs sampler against the two dataset
L 2,800 iterations, T 2,000 iterations
Use lowest perplexity model as the final model

14
HMM-LDA Analysis (cont.)

Semantic topics (Lectures)

Magnetism
Machine learning
Linear Algebra
Childhood Memories

ltlaughgt cursory examination of the data suggests
that speakers talking about children tend to
laugh more during the lecture
Although it may not be desirable to capture
speaker idiosyncrasies in the topic mixtures,
HMM-LDA has clearly demonstrated its ability to
capture distinctive semantic topics in a corpus

15
HMM-LDA Analysis (cont.)

Semantic topics (Textbook)

A topically coherent paragraph
6 of the 7 instances of the words and and or
(underline) are correctly classified
Multi-word topic key phrases can be identified
for n-gram topic models

the context-dependent labeling abilities of the
HMM-LDA models is demonstrated
16
HMM-LDA Analysis (cont.)

Syntactic States (Lectures)
State 20 is topic state

Prepositions
Conjunctions
Verbs
Hesitation disfluencies

As demonstrated with spontaneous speech, HMM-LDA
yields syntactic states that have a good
correspondence to part-of speech labels, without
requiring any labeled training data

17
Discussions

Although MCMC techniques converge to the global
stationary distribution, we cannot guarantee
convergence from observation of the perplexity
alone
Unlike EM algorithms, random sampling may
actually temporarily decrease the model
likelihood
The number of iteration was chosen to be at least
double the point at which the PP first appeared
to converge

18
Language Modeling Experiments

Baseline model Lecture Textbook Interpolated
trigram model (using modified Kneser-Ney
discounting)
Topic-deemphasized style (trigram) model
(Lectures)
To deemphasize the observed occurrences of topic
words and ideally redistribute these counts to
all potential topic words
The counts of topic to style word transitions are
not altered

19
Language Modeling Experiments (cont.)

Textbook model should ideally have higher weight
in the contexts containing topic words
Domain trigram model (Textbook)
Emphasize the sequences containing a topic word
in the context by doubling their counts

20
Language Modeling Experiments (cont.)

unsmoothed topical tirgram model
Apply HMM-LDA with 100 topics to identify
representative words and their associated
contexts for each topics
Topic mixtures for all models
Mixture weights were tuned on individual target
lectures (cheat)
15 of 100 topics account for over 90 of the
total weight

21
Language Modeling Experiments (cont.)

Since the topic distribution shifts over a long
lecture, modeling a lecture with fixed weights
may not be the most optimal
Update the mixture distribution by linearly
interpolating it with the posterior topic
distribution given the current word

22
Language Modeling Experiments (cont.)

The variation of topic mixtures

Review previous lecture -gt Show an example of
computation using accumulators -gt Focus the
lecture on stream as a data structure, with an
intervening example that finds pairs of i and j
that sum up to a prime
23
Language Modeling Experiments (cont.)

Experimental results

24
Conclusions

HMM-LDA shows great promise for finding structure
in unlabeled data, from which we can build more
sophisticated models
Speaker-specific adaptation will be investigated
in the future

Write a Comment

User Comments (0)