Title: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored
1Document Ontology Extractor(DOE)Research
TeamGovind R Maddi, Jun Zhao Chakravarthi S
VelvadapuFacultyDr.Sadanand
SrivastavaDr.James Gil De LamadridJoint
Project of University of Maryland, Baltimore
CountyBowie State UniversitySponsored
byDepartment Of Defense
2OVERVIEW
- The system takes text documents as its input
- Performs semantic analysis on these documents
- Generates useful ontology
- Represents it graphically
3GOAL
- To build an Ontology utilizing
- Statistical methods
- A small amount of user feedback
- Automation
4Architecture of DOE
Text Document
Pre-processing
Normalization
Latent Semantic Indexing (SVD)
Document Ontology
Graph Construction
GUI
5INPUT
6Pre-processing
- Read-in text file
- Extract meaningful terms
- Count their frequencies
7Normalization
- Calculate weight of each term using
-
- W i,k frequency i,k nk
- S frequency j,k
-
- Calculate weight of each term using
-
- W i,k frequency i,k nk
- S frequency j,k
- j1
8Normalization(contd)
- Calculate normalized weight using
-
- W i,k w(i,k)
- nk
- sqrt(S w2(j,k))
- j1
9Latent Semantic Indexing(LSI)
- Statistical method representing documents by
statistically independent concepts - Based on Singular Value Decomposition (SVD)
10Singular Value Decomposition (SVD)
- A technique that decomposes a given matrix into
three components U, S and V.
11SVD (contd)
- m x n term-document matrix A, of rank r, can be
expressed as the product - A U S VT
- U is m x r term matrix
- S is r x r diagonal matrix
- V is r x n document matrix
12SVD (contd)
- Diagonal of S contains singular values of A in
the descending order.
13SVD (contd)
- A is formed from LSI as follows
- A US SS VsT
- US - derived from U removing all but the s
columns - SS - derived from S removing all but the largest
s singular values - VsT - derived from VT removing all but the s
corresponding rows
14SVD (contd)
US
VsT
SS
A m x n
U m x r
S r x r
VT r x n
15Document Ontology
- Build Concept Nodes and Term Nodes
- using the document matrix (V) and term
matrix (U).
16Building concept nodes from term matrix(U)
- A concept node contains information about
- Concept name
- Terms that belong to that concept
- Respective weights of terms in that concept
17Building concept nodes from term matrix(U) (contd)
- Naming convention
- Generates automatically
- A hyphenated string of the five most high
frequent terms in that concept
18Building concept nodes from term matrix(U) (contd)
- A concept node represents a document
- Each column in U corresponds to a concept node
19Building term nodes from term matrix(U)
- A term node contains information about
- Term name
- Concepts to which it belongs
- Its respective weight in each concept
20Building term nodes from term matrix(U) (contd)
- Naming convention
- Generates automatically
- Simply named using the term name
21Building term nodes from term matrix(U) (contd)
- A term node represents a term
- Each row in U corresponds to a term node
22Graph Construction
- A bipartite graph is constructed with concept
nodes and term nodes - A concept node is connected to all term nodes
that belong to it. - A term node is connected to all concept nodes to
which it belongs.
23Graph Construction (contd)
Term 1
Concept 1
Term 2
Term 3
Term 4
Concept 2
Term 5
24Graphical User Interface
25GUI (contd)
- GUI consists of
- Concepts list
- Terms list
- Display for bipartite graph
- Display for list of files in ontology
26(No Transcript)
27GUI
- To view terms related to a concept, user selects
that concept from concepts list - To view concepts related to a term, user selects
that term from terms list
28(No Transcript)
29GUI (contd)
- To view only terms related to a specific concept
- Select that concept from concepts list
- Select checkbox Display Selected Ones Only
- Result
- GUI displays ONLY relations between selected
terms and concepts
30GUI (contd)
- To view only concepts related to a term
- Select that term from terms list
- Select checkbox Display Selected Ones Only
- Result
- GUI displays ONLY relations between selected
terms and concepts
31(No Transcript)
32GUI (contd)
- To highlight relationship between a term and a
concept - Select that term or concept from terms or
concepts list - Click on line connecting term and concept
33(No Transcript)
34GUI File Operations
35(No Transcript)
36GUI Ontology Updates
- Add
- Delete
- ChangeSVDThreshold
- changeConcThreshold
- foldInDoc
- defaultBuild
37(No Transcript)
38GUI Ontology Updates
- Add
- Click on Add
- Select file to be added from file chooser popup
menu - Choose whether to build now or not
- If yes document is added and displayed
- If no GUI remains unchanged
39(No Transcript)
40(No Transcript)
41(No Transcript)
42GUI Ontology Updates
- Delete
- Click on Delete
- Select file to be deleted from file chooser popup
menu - Choose whether to build now or not
- If yes document is deleted and displayed
- If no GUI remains unchanged
43(No Transcript)
44(No Transcript)
45(No Transcript)
46GUI Ontology Updates
- changeSVDThreshold
- SVDThreshold controls the largest s singular
values that will be selected from S. - Default value is 70 i.e. only the singular
values higher than 70 of the highest singular
value are selected - User can change this default value
47(No Transcript)
48(No Transcript)
49GUI Ontology Updates
- changeConcThreshold
- Controls the number of terms related to a concept
based upon term weight - Default value is 70 i.e. only the terms with
weights higher than 70 of the highest term
weight are selected - User can change this default value
50(No Transcript)
51(No Transcript)
52GUI Ontology Modifications
- Rename
- Renames a selected concept
- DelTerm
- Deletes a selected term
- Undo
- Ignores last modification and returns to the
previous state
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58Future Work
- To investigate less expensive methods for adding
new documents - Fold-In
- SVD update
59Future Work
- Fold-In
- A method to add new document(s) to an existing
ontology - Uses the existing data in document addition
process - Less expensive process than the regular build
method
60Acknowledgements
- We express our appreciation to
- Department Of Defense
- University of Maryland, Baltimore County
- Advisors, Bowie State University