Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored

Description:

Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored


1
Document Ontology Extractor(DOE)Research
TeamGovind R Maddi, Jun Zhao Chakravarthi S
VelvadapuFacultyDr.Sadanand
SrivastavaDr.James Gil De LamadridJoint
Project of University of Maryland, Baltimore
CountyBowie State UniversitySponsored
byDepartment Of Defense
2
OVERVIEW
  1. The system takes text documents as its input
  2. Performs semantic analysis on these documents
  3. Generates useful ontology
  4. Represents it graphically

3
GOAL
  • To build an Ontology utilizing
  • Statistical methods
  • A small amount of user feedback
  • Automation

4
Architecture of DOE
Text Document
Pre-processing
Normalization
Latent Semantic Indexing (SVD)
Document Ontology
Graph Construction
GUI
5
INPUT
  • Text documents

6
Pre-processing
  • Read-in text file
  • Extract meaningful terms
  • Count their frequencies

7
Normalization
  • Calculate weight of each term using
  • W i,k frequency i,k nk
  • S frequency j,k
  • Calculate weight of each term using
  • W i,k frequency i,k nk
  • S frequency j,k
  • j1

8
Normalization(contd)
  • Calculate normalized weight using
  • W i,k w(i,k)
  • nk
  • sqrt(S w2(j,k))
  • j1

9
Latent Semantic Indexing(LSI)
  • Statistical method representing documents by
    statistically independent concepts
  • Based on Singular Value Decomposition (SVD)

10
Singular Value Decomposition (SVD)
  • A technique that decomposes a given matrix into
    three components U, S and V.

11
SVD (contd)
  • m x n term-document matrix A, of rank r, can be
    expressed as the product
  • A U S VT
  • U is m x r term matrix
  • S is r x r diagonal matrix
  • V is r x n document matrix

12
SVD (contd)
  • Diagonal of S contains singular values of A in
    the descending order.

13
SVD (contd)
  • A is formed from LSI as follows
  • A US SS VsT
  • US - derived from U removing all but the s
    columns
  • SS - derived from S removing all but the largest
    s singular values
  • VsT - derived from VT removing all but the s
    corresponding rows

14
SVD (contd)

US
VsT
SS
A m x n
U m x r
S r x r
VT r x n
15
Document Ontology
  • Build Concept Nodes and Term Nodes
  • using the document matrix (V) and term
    matrix (U).

16
Building concept nodes from term matrix(U)
  • A concept node contains information about
  • Concept name
  • Terms that belong to that concept
  • Respective weights of terms in that concept

17
Building concept nodes from term matrix(U) (contd)
  • Naming convention
  • Generates automatically
  • A hyphenated string of the five most high
    frequent terms in that concept

18
Building concept nodes from term matrix(U) (contd)
  • A concept node represents a document
  • Each column in U corresponds to a concept node

19
Building term nodes from term matrix(U)
  • A term node contains information about
  • Term name
  • Concepts to which it belongs
  • Its respective weight in each concept

20
Building term nodes from term matrix(U) (contd)
  • Naming convention
  • Generates automatically
  • Simply named using the term name

21
Building term nodes from term matrix(U) (contd)
  • A term node represents a term
  • Each row in U corresponds to a term node

22
Graph Construction
  • A bipartite graph is constructed with concept
    nodes and term nodes
  • A concept node is connected to all term nodes
    that belong to it.
  • A term node is connected to all concept nodes to
    which it belongs.

23
Graph Construction (contd)
Term 1
Concept 1
Term 2
Term 3
Term 4
Concept 2
Term 5
24
Graphical User Interface
  • (GUI)

25
GUI (contd)
  • GUI consists of
  • Concepts list
  • Terms list
  • Display for bipartite graph
  • Display for list of files in ontology

26
(No Transcript)
27
GUI
  • To view terms related to a concept, user selects
    that concept from concepts list
  • To view concepts related to a term, user selects
    that term from terms list

28
(No Transcript)
29
GUI (contd)
  • To view only terms related to a specific concept
  • Select that concept from concepts list
  • Select checkbox Display Selected Ones Only
  • Result
  • GUI displays ONLY relations between selected
    terms and concepts

30
GUI (contd)
  • To view only concepts related to a term
  • Select that term from terms list
  • Select checkbox Display Selected Ones Only
  • Result
  • GUI displays ONLY relations between selected
    terms and concepts

31
(No Transcript)
32
GUI (contd)
  • To highlight relationship between a term and a
    concept
  • Select that term or concept from terms or
    concepts list
  • Click on line connecting term and concept

33
(No Transcript)
34
GUI File Operations
  • New
  • Open
  • Save
  • saveAs
  • Close
  • Exit

35
(No Transcript)
36
GUI Ontology Updates
  • Add
  • Delete
  • ChangeSVDThreshold
  • changeConcThreshold
  • foldInDoc
  • defaultBuild

37
(No Transcript)
38
GUI Ontology Updates
  • Add
  • Click on Add
  • Select file to be added from file chooser popup
    menu
  • Choose whether to build now or not
  • If yes document is added and displayed
  • If no GUI remains unchanged

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
GUI Ontology Updates
  • Delete
  • Click on Delete
  • Select file to be deleted from file chooser popup
    menu
  • Choose whether to build now or not
  • If yes document is deleted and displayed
  • If no GUI remains unchanged

43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
GUI Ontology Updates
  • changeSVDThreshold
  • SVDThreshold controls the largest s singular
    values that will be selected from S.
  • Default value is 70 i.e. only the singular
    values higher than 70 of the highest singular
    value are selected
  • User can change this default value

47
(No Transcript)
48
(No Transcript)
49
GUI Ontology Updates
  • changeConcThreshold
  • Controls the number of terms related to a concept
    based upon term weight
  • Default value is 70 i.e. only the terms with
    weights higher than 70 of the highest term
    weight are selected
  • User can change this default value

50
(No Transcript)
51
(No Transcript)
52
GUI Ontology Modifications
  • Rename
  • Renames a selected concept
  • DelTerm
  • Deletes a selected term
  • Undo
  • Ignores last modification and returns to the
    previous state

53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
Future Work
  • To investigate less expensive methods for adding
    new documents
  • Fold-In
  • SVD update

59
Future Work
  • Fold-In
  • A method to add new document(s) to an existing
    ontology
  • Uses the existing data in document addition
    process
  • Less expensive process than the regular build
    method

60
Acknowledgements
  • We express our appreciation to
  • Department Of Defense
  • University of Maryland, Baltimore County
  • Advisors, Bowie State University
Write a Comment
User Comments (0)
About PowerShow.com