Linear Algebra and Terrorist Threats: - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Linear Algebra and Terrorist Threats:

Description:

'Relationship Discovery in Large Text Collections Using Latent ... Martyrs Brigade. Abu Sayyaf. Group. Trafalgar. Square. Strasbourg. Cathedral. NATO. HQ. Athens ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 39
Provided by: Craw160
Category:

less

Transcript and Presenter's Notes

Title: Linear Algebra and Terrorist Threats:


1
Linear Algebra and Terrorist Threats
  • Finding Relationships in Large Sets of Text

Catherine Crawford October 31, 2007 Elmhurst
College
2
Acknowledgements
  • Relationship Discovery in Large Text Collections
    Using Latent Semantic Indexing
  • R. B. Bradford, SAIC
  • 2006 SIAM Conference on Data Mining Workshop on
    Link Analysis, Counterterrorism and Security
  • William M. Pottenger, Ph.D.
  • DyDAn Center, Rutgers University
  • 2007 DIMACS Reconnect Conference on Data Analysis
    in Law Enforcement and Homeland Security

3
Linear Algebra Concepts
  • For an n x n matrix A, a nonzero vector x is
    called an eigenvector of A, if
  • Ax ?x for some scalar ?.
  • The scalar ? is called an eigenvalue.

4
eigenvalues ?1 6 and ?2 -1 eigenvectors
5
Diagonalization
  • An n x n matrix A is said to be diagonalizable if
    it is similar to a diagonal matrix.
  • A PDP-1
  • for some diagonal matrix D and invertible matrix
    P.

6
Diagonalization A PDP-1
  • Columns of P
  • Entries of D

Eigenvectors (n linearly independent)
Eigenvalues
7
Limitations
  • What if A does not have n linearly independent
    eigenvectors?
  • What if A is not square, i.e. A is m x n?
  • Not diagonalizable, but

8
  • If A is m x n, then ATA is n x n
  • ATA is symmetric
  • ATA can be diagonalized i.e. ATA PDP-1
  • D is a diagonal matrix with the eigenvalues of
    ATA as the diagonal entries

9
Singular Values
The matrix ATA has eigenvalues
The singular values of the m x n matrix A are
given by
10
Singular Value Decomposition
  • Any m x n matrix A can be factored as
  • A USVT
  • where U is an m x m orthogonal matrix and
  • V is an orthogonal n x n matrix and
  • S is an m x n matrix of the form

11
r is the rank of A
m - r rows
n - r columns
  • D is an r x r diagonal matrix with the first r
    singular values of A along the diagonal.

12
Eigenvalues of ATA are ?1 360, ?2 90, and ?3
0.
So the Singular Values of A are
13
A USVT
14
Information Retrieval
  • Given a set of documents and a query
  • (a collection of terms)
  • Return the documents ranked by their similarity
    to the query

15
Search Engines
  • Google, Ask Jeeves, etc.
  • Term Matching
  • Type in words (query) and it returns webpages
    (documents) that contain those words
  • Focus on one document at a time
  • e.g. movies in 2006 returns sites that list
    movies from 2006
  • But what if I want to know the common theme(s),
    if any, of popular movies in 2006?

16
Common Themes Movies 2006
  • Sites with the term common themes often in a
    review of a 2006 movie
  • Results of someone else having researched the
    question and posting their comments. (If you are
    lucky)

17
We want
  • Computer to search several documents and infer
    the answer and return it to us.
  • (Finding relationships in large sets of text)
  • ?Latent Semantic Indexing

18
Latent
Use information that might not be obvious
Semantic
Focus on context and meaning rather than just
matching
Indexing
Organize data to provide an efficient search and
retrieval
19
Latent Semantic Indexing (LSI)
  • Steps
  • Construct the Term-Document Matrix A
  • Factor A into its Singular Value Decomposition
    (SVD), i.e. A USVT
  • Reduce the dimensions of the matrices
  • Compute the final LSI Space

20
Latent Semantic Indexing
  • Term by Doc
  • Matrix

A
  • Parsing

Documents
X
X
U
S
VT
  • SVD

VkT
X
X
  • LSI
  • Space

Uk
Sk
  • Dimension
  • Reduction

See Deerwester et al., Indexing by Latent
Semantic Analysis, Journal of the American
Society for Information Science, 41(6), pp.
391-407, October, 1990.
21
Interpretation of SVD
A USVT

x
x
Concept Inherent, Latent, Underlying
Information
22
LSI and Terrorist Threats
  • Database
  • 158,492 Documents
  • English-language News Articles from 2002 2003

23
Entity Extraction (Preprocessing)
Example Results of Entity Extraction
Document
24
Entity Extraction Results
  • 334,557 Unique Entities
  • 126,372 Persons
  • 37,706 Locations
  • 170,479 Organizations
  • 101,533 Unique Entities Occuring More than Once

25
LSI Indexing
  • 332,386 Indexed Objects
  • 158,492 Documents
  • 230,853 Individual Terms
  • 101,533 Entity Names

26
Example
Term-Document Matrix
A
q
Query Is GSPC planning an attack on the
cathedral in Strasbourg?
27
Latent Semantic Indexing
Reduced Dimension Term-Document Matrix Ak
VkT
X
X
  • LSI
  • Space

Uk
Sk
See Deerwester et al., Indexing by Latent
Semantic Analysis, Journal of the American
Society for Information Science, 41(6), pp.
391-407, October, 1990.
28
Reduced Dimensions
  • Query Vector
  • Document Vector

reduced dimension jth document vector jth
column of VTk
29
Representation Vectors
Entity
Document
?
?
?
Term
LSI Representation Space
Cosine between vectors is a measure of similarity
30
Similarity Measure
31
Rankings
  • Results of sim( ), rank how similar each document
    is to the query.
  • Document with the highest value is most similar

32
Entities of Particular Interest
Groups
Individuals
Targets
Weapons
Activities
33
Relationships of Particular Interest
  • Group Group
  • Person Group
  • Person Person
  • Group Target
  • Group Weapon

34
Procedure (Contd)
Create Matrix of Items to be Compared
35
Terrorist Groups vs. Targets
36
Review of Procedure
  • Pre-Process Text with Entity Extraction Software
  • Create LSI Representation Space
  • Construct the Term-Document Matrix A
  • Factor A into its SVD, i.e. A USVT
  • Reduce the dimensions of the matrices
  • Compute the final LSI Space
  • Create Matrix of Items to Be Compared
  • Compute Cosines between Pairs of Representation
    Vectors

37
References
  • Overview of the LSI Technique
  • Deerwester et al., Indexing by Latent Semantic
    Analysis, Journal of the American Society for
    Information Science, 41(6), October, 1990 pp.
    391-407.
  • Review of the LSI Literature
  • Dumais, S., Latent Semantic Analysis, in
    Annual Review of Information Science and
    Technology, Vol. 38, Information Today Inc.,
    Medford, New Jersey, 2004, pp. 189-230.
  • Effects of LSI Parameter Choices
  • Dumais, S., Enhancing Performance in Latent
    Semantic Indexing (LSI) Retrieval, Bellcore
    Technical Report TM-ARH-017527, 1990.
  • LSI Capture of Higher-order Associations
  • Kontostathis, A. and Pottenger, W. M. (2006) A
    Framework for Understanding LSI Performance.
    Information Processing Management, Volume 42,
    Issue 1, Pages 56-73. January.
  • Utility of Matrix Decomposition Techniques in
    Social Network Analysis
  • Skillicorn, D., Social Network Analysis via
    Matrix Decompositions, Emergent Information
    Technologies and Enabling Policies for Counter
    Terrorism, IEEE-Wiley, 2006.
  • Information Retrieval through LSI
  • DIMACS Education Module (High School)

38
Questions?
Write a Comment
User Comments (0)
About PowerShow.com