Style

1 / 24
About This Presentation
Title:

Style

Description:

An effective LM needs to not only account for the casual ... In this paper, the syntactic state and semantic topic ... ensemble of 'walkers' moves around ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 25
Provided by: XUAN87

less

Transcript and Presenter's Notes

Title: Style


1
Style Topic Language Model Adaptation Using
HMM-LDA
  • Bo-June (Paul) Hsu, James Glass

2
Outline
  • Introduction
  • LDA
  • HMM-LDA
  • Experiments
  • Conclusions

3
Introduction
  • An effective LM needs to not only account for the
    casual speaking style of lectures but also
    accommodate the topic-specific vocabulary of the
    subject matter
  • Available training corpora rarely match the
    target lecture in both style and topic
  • In this paper, the syntactic state and semantic
    topic assignment are investigated using HMM with
    LDA model

4
LDA
  • A generative probabilistic model of a corpus
  • The topic mixture is drawn from a conjugate
    Dirichlet prior
  • PLSA
  • LDA
  • Model parameters

5
Markov chain Monte Carlo
  • A class of algorithms for sampling from
    probability distributions based on constructing a
    Markov chain that has the desired distribution as
    its stationary distribution
  • The most common application of these algorithms
    is numerically calculating multi-dimensional
    integrals
  • an ensemble of "walkers" moves around randomly
  • A Markov chain is constructed in such a way as to
    have the integrand as its equilibrium
    distribution

6
LDA
  • Estimate posteriori
  • Integrating out
  • Gibbs sampling

7
Markov chain Monte Carlo (cont.)
  • Gibbs Sampling http//en.wikipedia.org/wiki/Gibbs_
    sampling

8
HMMLDA
  • HMMs generate documents purely based on syntactic
    relations among unobserved word classes
  • Short-range dependencies
  • Topic model generate documents based on semantic
    correlations between words, independent of word
    order
  • long-range dependencies
  • A major advantage of generative models is
    modularity
  • Different models are easily combined
  • Words are exhibited by Mixture of model product
    of model
  • Only a subset of words, content words, exhibit
    long-range dependencies
  • Replace one probability distribution over words
    used in syntactic model with the semantic model

9
HMMLDA (cont.)
  • Notation
  • A sequence of words
  • A sequence of topic assignments
  • A sequence of classes
  • means semantic class
  • zth topic associated with distribution over words
  • Each class is associated with distribution over
    words
  • Each document has a distribution over topic
  • Transition between class and
    follows a distribution

10
HMMLDA (cont.)
  • A document is generated
  • Sample from a prior
  • For each word in document
  • Draw from
  • Draw from
  • If , then draw from ,else draw
    from

11
HMMLDA (cont.)
  • Inference
  • are drawn from
  • are drawn from
  • The row of the transition matrix are drawn from
  • are drawn from
  • Assume all Dirichlet distribution are symmetric

12
HMMLDA (cont.)
  • Gibbs Sampling

13
HMM-LDA Analysis
  • Lectures Corpus
  • 3 undergraduate subject in math, physics,
    computer science
  • 10 CS lectures for development set, 10 CS
    lectures for test set
  • Textbook Corpus
  • CS course textbook
  • divided in to 271 topic-cohesive documents at
    every section heading
  • Run Gibbs sampler against the two dataset
  • L 2,800 iterations, T 2,000 iterations
  • Use lowest perplexity model as the final model

14
HMM-LDA Analysis (cont.)
  • Semantic topics (Lectures)

Magnetism
Machine learning
Linear Algebra
Childhood Memories
  • ltlaughgt cursory examination of the data suggests
    that speakers talking about children tend to
    laugh more during the lecture
  • Although it may not be desirable to capture
    speaker idiosyncrasies in the topic mixtures,
    HMM-LDA has clearly demonstrated its ability to
    capture distinctive semantic topics in a corpus

15
HMM-LDA Analysis (cont.)
  • Semantic topics (Textbook)
  • A topically coherent paragraph
  • 6 of the 7 instances of the words and and or
    (underline) are correctly classified
  • Multi-word topic key phrases can be identified
    for n-gram topic models

the context-dependent labeling abilities of the
HMM-LDA models is demonstrated
16
HMM-LDA Analysis (cont.)
  • Syntactic States (Lectures)
  • State 20 is topic state

Prepositions
Conjunctions
Verbs
Hesitation disfluencies
  • As demonstrated with spontaneous speech, HMM-LDA
    yields syntactic states that have a good
    correspondence to part-of speech labels, without
    requiring any labeled training data

17
Discussions
  • Although MCMC techniques converge to the global
    stationary distribution, we cannot guarantee
    convergence from observation of the perplexity
    alone
  • Unlike EM algorithms, random sampling may
    actually temporarily decrease the model
    likelihood
  • The number of iteration was chosen to be at least
    double the point at which the PP first appeared
    to converge

18
Language Modeling Experiments
  • Baseline model Lecture Textbook Interpolated
    trigram model (using modified Kneser-Ney
    discounting)
  • Topic-deemphasized style (trigram) model
    (Lectures)
  • To deemphasize the observed occurrences of topic
    words and ideally redistribute these counts to
    all potential topic words
  • The counts of topic to style word transitions are
    not altered

19
Language Modeling Experiments (cont.)
  • Textbook model should ideally have higher weight
    in the contexts containing topic words
  • Domain trigram model (Textbook)
  • Emphasize the sequences containing a topic word
    in the context by doubling their counts

20
Language Modeling Experiments (cont.)
  • unsmoothed topical tirgram model
  • Apply HMM-LDA with 100 topics to identify
    representative words and their associated
    contexts for each topics
  • Topic mixtures for all models
  • Mixture weights were tuned on individual target
    lectures (cheat)
  • 15 of 100 topics account for over 90 of the
    total weight

21
Language Modeling Experiments (cont.)
  • Since the topic distribution shifts over a long
    lecture, modeling a lecture with fixed weights
    may not be the most optimal
  • Update the mixture distribution by linearly
    interpolating it with the posterior topic
    distribution given the current word

22
Language Modeling Experiments (cont.)
  • The variation of topic mixtures

Review previous lecture -gt Show an example of
computation using accumulators -gt Focus the
lecture on stream as a data structure, with an
intervening example that finds pairs of i and j
that sum up to a prime
23
Language Modeling Experiments (cont.)
  • Experimental results

24
Conclusions
  • HMM-LDA shows great promise for finding structure
    in unlabeled data, from which we can build more
    sophisticated models
  • Speaker-specific adaptation will be investigated
    in the future
Write a Comment
User Comments (0)