Title: Exact Maximum Likelihood Estimation for Word Mixtures
1Exact Maximum Likelihood Estimation for Word
Mixtures
- Yi Zhang Jamie Callan
- Carnegie Mellon University
- yiz,callan_at_cs.cmu.edu
- Wei Xu
- NEC CC Research Lab
- xw_at_ccrl.sj.nec.com
2Outline
- Introduction
- Why this problems? some retrieval applications
- Traditional solutions EM algorithm
- New algorithm exact MLE estimation
- Experimental Results
3Example 1 Model-based Feedback in the Language
Modeling Approach to IR
Document D
Results
Query Q
Feedback Docs Fd1, d2 , , dn
Based on ZhaiLaffertys slides in CIKM 2001
4?F Estimation based on Generative Mixture Model
Given F, P(wc) and ? Find MLE of ?
Based on ZhaiLaffertys slides in CIKM 2001
5Example 2 Model-based Approach for Novelty
Detection in Adaptive Information Filtering
Given ?general English, ?Topic ?E ?T ?new
Find MLE of ?new
Based on ZhangCallans paper in SIGIR 2002
6Problem Setting and Traditional Solution Using EM
- Observe data generated by a mixture multinomial
distribution r(r1, r2, r3, , rk) - Given interpolation weights ? and ?, another
multinomial distribution p(p1, p2, p3, , pk) - Find the maximum likelihood estimation (MLE) of
multinomial distribution q(q1, q2, q3, , qk) - Traditional solution EM algorithm
- Iterative process which can be computationally
expensive - Only provide approximate solution
-
7Finding q (1)
Under the constraints
Where fi is observed frequency of word i
8Finding q (2)
For all the qi such that qi gt0, apply Lagrange
multiplier method and calculate the derivatives
with respect to qi
This is a close form solution for qi, if we know
all i that qi gt0. Theorem All the qi greater
than 0 correspond to the smallest See detailed
proof in our paper
9Algorithm for Finding Exact MLE for q
10Experiments Setting on Model Based Feedback in IR
- 20 relevant documents (sampled from AP Wire News
and Wall Street Journal dataset from 1988-1990)
for a topic as observed training data sequence. p
is calculated directly as described in
(ZhaiLafferty) from 119823 documents. - There are 2352 unique words in these 20 relevant
documents, which means at most 2352 qi's are none
zero, while there are 200542 pi's are none zero.
11EM result converges to the result calculated
directly by our algorithm.
12Compar ing the Speed of Our Algorithm With EM
- EM stop if change of
- LL lt 10-?
- 50000 times on PIII 500 PC
13Conclusion
- We developed a new training algorithm that
provide exact MLE for word mixtures - Theoretically and Empirically works well
- Can be used in several language model based IR
applications