Latent Semantic Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Latent Semantic Analysis

Description:

Latent Semantic Analysis. An Example. d1 : Romeo and Juliet. d2 : Juliet: O happy dagger! ... Latent Concepts. Latent Semantic Indexing (LSI) is a method for ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 13
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis


1
Latent Semantic Analysis
2
An Example
  • d1 Romeo and Juliet.
  • d2 Juliet O happy dagger!
  • d3 Romeo died by dagger.
  • d4 Live free or die, thats the
    New-Hampshires motto.
  • d5 Did you know, New-Hampshire is in
    New-England.
  • q dies, dagger
  • Which document should be returned and how the
    ranking should be?

3
Eigenvectors and Eigenvalues
  • Let A be an n n matrix.
  • If x is an n-dimensional vector, then the
    matrix-vector product
  • Ax
  • is well-defined, and the result is again an
    n-dimensional vector.
  • In general, multiplication by a matrix changes
    the direction of a non-zero vector x, unless the
    vector is special and we have that
  • Ax ? x
  • for some scalar ?.

4
Matrix Decomposition
  • Let S be the matrix with eigenvectors of A as
    columns.
  • Let ? be the diagonal matrix with the eigenvalues
    of A on the diagonal.
  • Then
  • A S?S-1
  • If A is symmetric then we have S-1ST
  • A S?ST

5
Singular Value Decomposition
  • Let A be an m n matrix with entries being real
    numbers and m gt n.
  • Consider the n n square matrix B ATA.
  • B is symmetric
  • it has been shown that the eigenvalues of such
    (ATA) matrices are non-negative.
  • Since they are non-negative we can write them in
    decreasing order as squares of non-negative real
    numbers ?12 gt . . . gt ?n2
  • For some index r (possibly n) the first r numbers
    are positive whereas the rest are zero.
  • S1 x1, . . . , xr
  • y1(1/?1)Ax1 ... yr(1/?r)Axr
  • S2 y1, ..., yr
  • We can show that
  • A S2 ? S1T
  • ? is diagonal and the values along the diagonal
    are ?1, . . . , ?n which are called singular
    values.
  • If we denote S2 by S and S1 by U we have A S ?
    UT

6
Example
  • d1 Romeo and Juliet.
  • d2 Juliet O happy dagger!
  • d3 Romeo died by dagger.
  • d4 Live free or die, thats the
    New-Hampshires motto.
  • d5 Did you know, New-Hampshire is in
    New-England.
  • q dies, dagger

7
Document-term matrix
8
Latent Concepts
  • Latent Semantic Indexing (LSI) is a method for
    discovering hidden concepts in document data.
  • Each document and term (word) is then expressed
    as a vector with elements corresponding to these
    concepts.
  • Each element in a vector gives the degree of
    participation of the document or term in the
    corresponding concept.
  • Goal is not to describe the concepts verbally,
    but to be able to represent the documents and
    terms in a unified way for exposing
  • document-document,
  • document-term, and
  • term-term similarities
  • which are otherwise hidden

9
Matrix ?
Matrix A can be written A S?UT Let's
"neglect" the last three singular values of ? as
being too "small"... Also, just keep two
columns from S obtaining S2 and two rows from UT
obtaining U2T Matrix A is approximated as A2
S2?U2T In general Ak Sk?UkT where a good
value for k is determined empirically.
10
Matrices ?2, S2, U2
11
Representing Documents, Terms, and Queries
  • Represent documents by the column vectors of ?U2T
  • Represent terms by the row vectors S2?
  • Represent queries by the centroid vector of their
    terms

12
Geometry
Write a Comment
User Comments (0)
About PowerShow.com