Term Cooccurrence Analysis as an Interface to Digital Libraries - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Term Cooccurrence Analysis as an Interface to Digital Libraries

Description:

The system returns the pair-wise co-occurrence counts of the terms over the ... Plato and Aristotle Plato and Cher. Science and Nature Science and National Tattler ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 26
Provided by: vwInd
Learn more at: http://vw.indiana.edu
Category:

less

Transcript and Presenter's Notes

Title: Term Cooccurrence Analysis as an Interface to Digital Libraries


1
Term Co-occurrence Analysis as an Interface to
Digital Libraries
  • Jan W. Buzydlowski
  • Howard D. White
  • Xia Lin
  • College of Information Science and Technology
  • Drexel University, Philadelphia, Pennsylvania, USA

2
Digital Library Research
  • First Wave
  • How to store it
  • Next Wave
  • How to retrieve it (IR)
  • Text Mining
  • Visual Information Retrieval Interface (VIRI)
  • Term Co-occurrence Analysis (TCA)
  • Co-occurrence vs. lexical associations
  • Maps vs. lists

3
Term Definition
  • Unit of Analysis
  • Words
  • Documents
  • Authors
  • Journals
  • Section of Focus
  • Abstract/Text
  • Title
  • Bibliography
  • Keywords

4
Example
  • Words in Title
  • Term
  • Co-occurrence
  • Analysis
  • Interface
  • Digital
  • Library
  • Authors in Bibliography
  • Salton-G
  • Chen-C
  • White-HD
  • Ding-Y
  • Cleveland-W
  • McCain-K
  • Lin-X
  • Schvaneveldt-R
  • Kamada-T
  • Fruchterman-T

5
Term Co-occurrence Methodology
  • User determines which terms are of interest
  • Via a seed term
  • From a pre-defined list
  • The system returns the pair-wise co-occurrence
    counts of the terms over the collection of records

6
Example
  • Unit Author Section Bibliography
  • User Supplied List Plato, Aristotle, Smith,
    Brown
  • For a given data set (N 4 unique terms)
  • Article 1 Plato, Aristotle, Smith,
  • Article 2 Plato, Smith,
  • Article 3 Plato, Aristotle, Smith, Brown,
  • The following co-citations (C(4,2) 6) are found
  • COMBINATION COUNT ARTICLES
  • Plato and Smith 3 1, 2, 3
  • Plato and Aristotle 2 1, 3
  • Plato and Brown 1 3
  • Aristotle and Smith 2 1, 3
  • Aristotle and Brown 1 3
  • Smith and Brown 1 3

7
Term Co-occurrence Significance
  • The frequent co-occurrence of term pairs within a
    set of documents indicates a strong association
    between those terms, whereas a infrequent count
    indicates the opposite
  • The association you would expect is borne out by
    the frequency
  • The frequency you compute suggests a level of
    association
  • Pain and Management Pain and Obtainment
  • Plato and Aristotle Plato and Cher
  • Science and Nature Science and National Tattler
  • A and B C and D

8
Term Co-occurrence Uses
  • Allows a user to get a foothold with just one
    term
  • One seed term returns many other related terms
  • Allows a user to get a overview with
    user-supplied/system-supplied terms
  • Co-occurrence counts with visualization

9
Seeding
  • User types in
  • One term, e.g., Plato
  • Boolean expression, e.g., Plato AND Brown
  • System supplies top n terms, in ranked order of
    frequency of co-occurrence with the initial term

10
Example
  • For Plato seed

ARISTOTLE PLUTARCH CICERO HOMER BIBLE EURIPIDES AR
ISTOPHANES XENOPHON AUGUSTINE HERODOTUS KANT-I AES
CHYLUS
SOPHOCLES THUCYDIDES OVID HESIOD DIOGENES-LAERTI H
EIDEGGER-M DERRIDA-J PINDAR NIETZSCHE-F HEGEL-GWF
VERGIL AQUINAS-T
11
Need for Visualization
  • Given a list of user- / system-supplied terms
  • Find the frequency of co-occurrence of each
    pair-wise combination of terms
  • Plato AND Aristotle 1,920
  • Plato AND Plutarch 380,
  • Too many numbers to take in at once
  • C(25, 2) (25 24)/ 2 300 pairs
  • Three major visualization techniques
  • Multidimensional Scaling (MDS)
  • Self-Organizing (Kohonen) Maps (SOMs)
  • PathFinder Networks (PFNETs)

12
P Arabie
JH Ward
JC Gower
M Wish
RN Shepard
RR Sokal
JB Kruskal
SC Johnson
PHA Sneath
JD Carroll
PE Green
JA Hartigan
HA Skinner
VE McGee
RK Blashfield
Whites MDS map of 15 co-cited classificationists,
ca. 1990
13
(No Transcript)
14
Whites PFNet of co-cited authors in Biblical and
literary hermeneutics, 1988-1997
15
Our System
  • Three tiered
  • User interface
  • Server
  • Database
  • Real-time and interactive
  • Significant data sources
  • ISI AHCI
  • MedLine
  • Live interface for retrieval

16
(No Transcript)
17
User Interface - Seed
18
User Interface SOM
19
Interface - PFNET
20
Interface - Visual Information Retrieval
Interface (VIRI)
21
User Interface IV
22
Database Interface
  • API
  • String findRel( String, int )
  • Int findOcc( String )
  • Implemented on
  • BRS
  • API via a wrapper
  • Oracle
  • API via JDBC
  • Noah
  • Specialized co-occurrence database
  • API via JNI

23
Future Plans
  • User Study
  • Preference
  • Type of map, etc.
  • Cognitive map
  • How well does the map match experts mental
    models
  • Larger datasets
  • Additional data sources

24
(No Transcript)
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com