Mixture Language Models and EM Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Mixture Language Models and EM Algorithm

Description:

Mixture Language Models and. EM Algorithm (Lecture for CS397-CXZ Intro ... text mining passage. food nutrition passage. A document with 2 types of vocabulary ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: Ale8279
Category:

less

Transcript and Presenter's Notes

Title: Mixture Language Models and EM Algorithm


1
Mixture Language Models and EM Algorithm
  • (Lecture for CS397-CXZ Intro Text Info Systems)
  • Sept. 17, 2003
  • ChengXiang Zhai
  • Department of Computer Science
  • University of Illinois, Urbana-Champaign

2
Rest of this Lecture
  • Unigram mixture models
  • Slightly more sophisticated unigram LMs
  • Related to smoothing
  • EM algorithm
  • VERY useful for estimating parameters of a
    mixture model or when latent/hidden variables are
    involved
  • Will occur again and again in the course

3
Modeling a Multi-topic Document
A document with 2 types of vocabulary
text mining passage food nutrition passage
text mining passage text mining passage food
nutrition passage
How do we model such a document? How do we
generate such a document? How do we estimate
our model?
Solution A mixture model EM
4
Simple Unigram Mixture Model
text 0.2 mining 0.1 assocation 0.01 clustering
0.02 food 0.00001
?0.7
Model/topic 1 p(w?1)
food 0.25 nutrition 0.1 healthy 0.05 diet 0.02
1-?0.3
Model/topic 2 p(w?2)
p(w?1? ?2) ?p(w?1)(1- ?)p(w?2)
5
Parameter Estimation
Likelihood
  • Estimation scenarios
  • p(w?1) p(w?2) are known estimate ?
  • p(w?1) ? are known estimate p(w?2)
  • p(w?1) is known estimate ? p(w?2)
  • ? is known estimate p(w?1) p(w?2)
  • Estimate ?, p(w?1), p(w?2)

clustering
6
Parameter Estimation ExampleGiven p(w?1) and
p(w?2), estimate ?
Maximum Likelihood
Expectation-Maximization (EM) Algorithm is a
commonly used method Basic idea Start from
some random guess of parameter values, and then
Iteratively improve our estimates (hill
climbing)
E-step compute the
lower bound M-step find a new ? that
maximizes the lower
bound
7
EM Algorithm Intuition
text 0.2 mining 0.1 assocation 0.01 clustering
0.02 food 0.00001
Observed Doc d
??
p(w?1)
food 0.25 nutrition 0.1 healthy 0.05 diet 0.02
1-??
p(w?2)
Suppose we know the identity of each word
p(w?1? ?2) ?p(w?1)(1- ?)p(w?2)
8
Can We Guess the Identity?
Identity (hidden) variable zw ?1 (w from ?1),
0(w from ?2)
zw 1 1 1 1 0 0 0 1 0 ...
Whats a reasonable guess? - depends on ?
(why?) - depends on p(w ?1) ) and p(w?2)
(how?)
the paper presents a text mining algorithm the pap
er ...
Initially, set ? to some random value, then
iterate
9
An Example of EM Computation
10
Any Theoretical Guarantee?
  • EM is guaranteed to reach a LOCAL maximum
  • When local maxima global maxima, EM can
    find the global maximum
  • But, when there are multiple local maximas,
    special techniques are needed (e.g., try
    different initial values)
  • In our case, we have one unique local maxima
    (why?)

11
A General Introduction to EM
Data X (observed) H(hidden) Parameter ?
Incomplete likelihood L(? ) log p(X
?) Complete likelihood Lc(? ) log p(X,H ?)
EM tries to iteratively maximize the complete
likelihood Starting with an initial guess
?(0), 1. E-step compute the expectation of the
complete likelihood 2. M-step compute ?(n) by
maximizing the Q-function
12
Convergence Guarantee
Goal maximizing Incomplete likelihood L(? )
log p(X ?) I.e., choosing ?(n), so
that L(?(n))-L(?(n-1))?0 Note that, since p(X,H
?) p(HX, ?) P(X ?) , L(?) Lc(?) -log p(HX,
?) L(?(n))-L(?(n-1)) Lc(?(n))-Lc(?
(n-1))log p(HX, ? (n-1) )/p(HX,
?(n)) Taking expectation w.r.t. p(HX, ?(n-1)),
L(?(n))-L(?(n-1)) Q(?(n) ?
(n-1))-Q(? (n-1) ? (n-1)) D(p(HX, ?
(n-1))p(HX, ? (n)))
EM chooses ?(n) to maximize Q
KL-divergence, always non-negative
Therefore, L(?(n)) ? L(?(n-1))!
13
Another way of looking at EM
Likelihood p(X ?)
L(?(n-1)) Q(?? (n-1)) -Q(? (n-1) ? (n-1) )
D(p(HX, ? (n-1) )p(HX, ? ))
next guess
current guess
Lower bound (Q function)
?
E-step computing the lower bound M-step
maximizing the lower bound
14
What You Should Know
  • What is unigram language so important?
  • What is a unigram mixture language model?
  • How to estimate parameters of simple unigram
    mixture models using EM
  • Know the general idea of EM (EM will be covered
    again later in the course)
Write a Comment
User Comments (0)
About PowerShow.com