Probabilistic Latent Semantic Indexing - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Probabilistic Latent Semantic Indexing

Description:

'Probabilistic Latent Semantic Indexing' Written by Thomas Hofmann ... Perplexity: Inverse of its. likelihood. Conclusions. PLSI is a good thing because ... – PowerPoint PPT presentation

Number of Views:473
Avg rating:3.0/5.0
Slides: 18
Provided by: cseLe
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Latent Semantic Indexing


1
Probabilistic Latent Semantic Indexing
  • Written by Thomas Hofmann
  • Presentation by Chris Janneck
  • Lehigh University CSE 397/497
  • Sept. 24, 2004

2
Overview
  • Introduction
  • LSI review
  • PLSI introduction
  • Aspect Model
  • Tempered Expectation Maximization
  • Geometrical Description
  • LSI-PLSI comparison
  • Similarities
  • Differences
  • Experiments
  • Conclusion

3
Intro If it werent for the people
  • Problem effective information retrieval
  • Corpus of data
  • Constantly increasing
  • Primarily text
  • (Human) user requests answers to be retrieved
    from corpus
  • Uses natural language query user-formulated
    query, often similar to spoken/written language
  • Human languages introduce ambivalence polysemy,
    synonomy
  • Simple term matching is no longer sufficient

4
Latent Semantic Indexing (LSI)
  • Popular retrieval enhancement procedure
  • Reduces dimensionality for faster and (ideally)
    more relevant results
  • Decompose term-doc matrix
  • Using Singular Value Decomposition (SVD)
  • Break into
  • Term-K matrix (U)
  • K-K (rank-rank) matrix (S)
  • Doc-K matrix (V)
  • Sort and eliminate rows with smaller ks
  • Dimension reduction via truncation

5
Probabilistic LSI
  • Uses LSI idea, but based in probability theory
  • Comes from statistical Aspect Model
  • Generate co-occurrence model based on
    non-observed class
  • This is a mixture model
  • Models a distribution through a mixture (weighted
    sum) of other distributions
  • Independence Assumptions
  • Observed pairs (doc, word) are generated randomly
  • Conditional independence conditioned on latent
    class, words are generated independently of
    document

6
Aspect Model
  • Generation process
  • Choose a doc d with prob P(d)
  • There are N ds
  • Choose a latent class z with (generated) prob
    P(zd)
  • There are K zs, and K ltlt N
  • Generate a word w with (generated) prob P(wz)
  • This creates pair (d, w), without direct concern
    for z
  • Joining the probabilities gives you

Remember P(zd) means probability of z, given d
7
Aspect Model (2)
  • Log-likelihood
  • Maximize this to find P(d), P(zd), P(wz)
  • Bayes format end up with
  • This is conceptually different than LSI
  • Doc-specific word distributions, P(wd), are
    based on combination of specific
    classes/factors/aspects, P(wz)
  • Not just assigned to nearest cluster

8
Tempered Expectation Maximization
  • EM is common technique to maximize likelihood
    estimation
  • Alternates between
  • E-step calculate future probabilities of z based
    on current estimates
  • M-step update estimate parameters based on
    calculated probabilities

9
Tempered Expectation Maximization (2)
  • Tempered include control b, where blt1
  • Use this b until performance plateaus, then
    change by b hb, where hlt1
  • Stop when no better performance after change

10
Geometrical Description
  • Prob distributions can now be mapped in K-1
    dimensional space
  • Instead of M-1
  • Since K-1 lt M-1, this is dimension reduction
  • M-1 is dimensions of all possible multinomials
  • Even though discrete points are mapped, convex
    hull provides continuous space

11
Similarities LSI and PLSI
  • Using intermediate, latent, non-observed data for
    classification (hence the L)
  • Can compose Joint Probability similar to LSI SVD
  • U ? U_hat P(di zk)
  • V ? V_hat P(wj zk)
  • S ? S_hat diag(P(zk))k
  • JP U_hatS_hatV_hat
  • JP is simliar to SVD term-doc matrix N
  • Values calculated probabilistically

12
Differences LSI and PLSI
  • Basis
  • LSI term frequencies (usually) and performs
    dimension reduction via projection or 0-ing
    weaker components
  • PLSI statistical generate (mostly random)
    model of probabilistic relation between W, D and
    Z refine until effective model is produced

13
Experiments
14
Experiments (2)
R3
R1
R2
R4
15
Experiments (3)
16
Experiments (4)
Perplexity Inverse of its likelihood
17
Conclusions
  • PLSI is a good thing because
  • Consistently better Prec/Rec curves than LSI
  • TEM SVD, computationally
  • Better from a modeling sense
  • Uses likelihood of sampling and aims for
    maximization
  • SVD uses L2-norm or other implicit Gaussian
    noise assumption
  • Polysemy is recognizable
  • By viewing P(wz)
  • Similar handling of synonomy
Write a Comment
User Comments (0)
About PowerShow.com