A Comparison of Graphical Techniques for the Display of Co-Occurrence Data - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Description:

Plato, Aristotle, Smith, Brown. Over a given set of many citing articles ... Given Plato, find related authors. Interface described in IV 2000 Paper. CSNA 2000 Paper ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 40
Provided by: facultyC3
Category:

less

Transcript and Presenter's Notes

Title: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data


1
A Comparison of Graphical Techniques for the
Display of Co-Occurrence Data
  • Jan W. Buzydlowski, Xia Lin, Howard D. White
  • College of Information Science and Technology
  • Drexel University
  • Philadelphia, PA 19104
  • USA

2
Information Visualization
  • (Data) Visualization allows for the revelation of
    intricate structure which cannot be absorbed in
    any other way. Cleveland, 1993
  • (Information) Visualization has two aspects,
    structural modeling and graphic
    representation.C. Chen, 1999
  • data - model - display

3
Visualization Overview
  • Model - Display
  • Co-Occurrence Model
  • 3 Graphical Displays
  • Data
  • Co-citation counts from the Institute for
    Scientific Information, Philadelphia, PA
  • Obtained from a 10-year Arts Humanities
    Citation Index database given Drexel by ISI for
    research purposes

4
Co-Occurrence Model
  • Examples
  • Derivation
  • Metrics

5
Co-Occurrence Data - Example 1
  • Market Basket Analysis
  • a shopping cart holds items purchased
  • e.g., milk, bread, razor blades, newspaper
  • Over all the sales for one day
  • what items are purchased together
  • how can we arrange the items in the store
  • Pampers and beer on Thursdays...

6
Co-Occurrence Data - Example 2
  • Author Co-citation Analysis (ACA)
  • Bibliographic data on a given article holds,
    e.g.,
  • title, keywords, abstract, citations to other
    documents
  • An article might cite, e.g.
  • Plato, Aristotle, Smith, Brown
  • Over a given set of many citing articles
  • Count how many times each pair of authors were
    cited together
  • Resulting co-citation count shows common
    intellectual interest

7
Co-Occurrence Derivation
  • For a given data set (N 4 unique terms)
  • Article 1 Plato, Aristotle, Smith
  • Article 2 Plato, Smith
  • Article 3 Plato, Aristotle, Smith, Brown
  • The following co-citations (C(4,2) 6) are found
  • COMBINATION COUNT ARTICLES
  • Plato and Smith 3 1, 2, 3
  • Plato and Aristotle 2 1, 3
  • Plato and Brown 1 3
  • Aristotle and Smith 2 1, 3
  • Aristotle and Brown 1 3
  • Smith and Brown 1 3

8
Co-Occurrence Measures
  • Raw counts
  • Additional information
  • Correlations
  • Replace each cell by correlation measure of each
    pair-wise column
  • Conditional Probability
  • Compute each cell by dividing each unique
    combination by total occurring

9
Co-Occurrence Structure -Example
10
Graphical Techniques
  • Three Methodologies
  • Multi-dimensional scaling
  • Self-organizing maps
  • Pathfinder networks

11
MDS
12
(No Transcript)
13
(No Transcript)
14
MDS Methodology
  • Given original distances (similarities) estimate
    coordinates that could give those distances
  • The computed distances should correspond to the
    original distances
  • Stress
  • Added dimensions

15
SOM
16
Self-Organizing Maps (SOMs)
  • Also known as Kohonen Maps
  • Based on Neural Networks
  • Related to wetware
  • robust techniques
  • If categories are known
  • supervised technique
  • backproprogating learning
  • If categories are sought
  • unsupervised technique
  • competitive learning

17
SOMs
  • Given a 2-D grid of nodes
  • each node has N weights
  • each vector (row) has N terms
  • map each input vector to a node
  • Similar to vector quantization (VQ)

18
SOMs Generation
  • nodes initially given random weights
  • randomly sample an input vector
  • row of co-occurrence matrix
  • with replacement
  • find a node closest to vector
  • Euclidean distance
  • update node weights
  • node weight node weight gain term distance
  • update neighborhood
  • cool gain term and neighborhood
  • repeat

19
PF Nets
20
Pathfinder Networks
  • Uses on graph notation
  • nodes authors
  • edges co-citation counts
  • Co-occurrence is a complete network (weighted,
    undirected)

Plato
3
Smith
2
2
Aristotle
21
Pathfinder Networks Generation
  • Pathfinder Network is generated by varying the
    parameters
  • distance (r)
  • triangle inequality (q)

22
Pathfinder Distance
  • Uses Minkowski metric
  • d (? eir )1/r
  • Example
  • e1 3, e2 4
  • r 1 gt d 7 3 4
  • Driving distance / ratio data
  • r 2 gt d 5 (9 16)1/2
  • Euclidean Distance
  • r (approaches) infinity gt d 4 max( 3, 4)
  • ordinal data
  • rank rather than value

23
Pathfinder Triangle Inequality
  • A required property of a metric definition
  • d(i,j) lt d(i,k) d(k,j)
  • But may not be justified
  • in personal judgments
  • If a is similar to b, and b is similar to c,
    there may be no transitive judgment of similarity
    from a to c
  • in set intersections
  • Even though Smith and Jones appear 12 times, and
    Jones and Brown appear 5 times, the overlap
    between Smith and Brown cannot be predicted

24
Pathfinder Triangle Inequality
  • Defines q-triangular
  • check paths of length q to determine if
    inequality is met
  • minimum is 2
  • maximum is n -1
  • full compliance
  • the longer the length, the fewer the connections

25
Pathfinder Example
26
Pathfinder Network Creation
  • PFNet (r, q)
  • Examine all paths of length q or less.
  • Use Minkowski Metric with parameter r to compute
    path length.
  • If a path of less weight is found, then remove
    the edge.

27
Pathfinder - Example
Smith
Jones
5
q 2
4
3
Brown
r 1 gt Smith - Jones is kept
r 2 gt Smith - Jones is kept
r infinity gt Smith - Jones is removed
28
Comparison of Techniques
  • MDS
  • Reduces dimensions / reveals clusters
  • 2D may be insufficient
  • measurement may not be Euclidean
  • SOM
  • robust
  • no guarantee of convergence/unique solution
  • Pathfinder
  • does not assume ratio data/triangle inequality
  • connections rather than position is important
  • additional methodology needed for display

29
Comparison of Techniques
  • Similarities
  • Spatial models
  • Differences
  • use of visual space
  • semantic meaning
  • as related to data
  • research in progress

30
Graphical Display of Methodologies
  • MDS
  • assume that 2 dimensions are sufficient
  • x, y for each point already defined
  • SOM
  • grid defines the 2D surface
  • plot each label with the appropriate node
  • Pathfinder
  • only defines the nodes and links
  • need additional methodologies
  • Spring-embedder models
  • Kamada and Kawai (1989)
  • Fruchterman and Reingold (1991)
  • Davidson and Harel (1996)

31
Graphical Comparison of Three Methods
  • Data
  • Institute for Scientific Information
  • Arts and Humanities Database (AHCI)
  • 1988 - 1997
  • 1.26 million records
  • Example
  • Given Plato, find related authors
  • Interface described in IV 2000 Paper
  • CSNA 2000 Paper
  • (Lin, Buzydlowski, White)

32
25 Authors Co-cited with Plato
  • PLATO (4928)
  • ARISTOTLE (1861)
  • PLUTARCH (838)
  • CICERO (699)
  • HOMER (627)
  • BIBLE (552)
  • EURIPIDES (515)
  • ARISTOPHANES (474)
  • XENOPHON (459)
  • AUGUSTINE (432)
  • HERODOTUS (425)
  • KANT-I (385)
  • AESCHYLUS (374)
  • SOPHOCLES (363)
  • THUCYDIDES (363)
  • OVID (334)
  • HESIOD (325)
  • DIOGENES-LAERTIUS (317)
  • HEIDEGGER-M (312)
  • DERRIDA-J (304)
  • PINDAR (292)
  • NIETZSCHE-F (278)
  • HEGEL-GWF (264)
  • VERGIL (259)
  • AQUINAS-T (255)

33
300 Pair-wise co-citations
  • 1 PLATO AND ARISTOTLE -1940 docs
  • 2 PLATO AND PLUTARCH - 872 docs
  • .
  • .
  • .
  • 300 VERGIL AND AQUINAS-T - 38 docs

34
Visualization allows for the revelation of
intricate structure which cannot be absorbed in
any other way...
35
2D MDS map of 25 authors co-cited with Plato
36
(No Transcript)
37
PFNet of 25 authors co-cited with Plato
AESCHYLUS
SOPHOCLES
EURIPIDES
HESIOD
AUGUSTINE
HOMER
PINDAR
BIBLE
ARISTOPHANES
PLATO
DIOGENES-LAERTIUS
ARISTOTLE
XENOPHON
KANT-I
CICERO
AQUINAS-T
PLUTARCH
HEIDEGGER-M
THUCYDIDES
DERRIDA-J
HEGEL-GWF
HERODOTUS
OVID
NIETZSCHE-F
VERGIL
38
Conclusion
  • Slides available at
  • faculty.cis.drexel.edu/jbuzydlo/
  • janb_at_drexel.edu

39
Bibliography
  • Chen, Chaomei, Information Visualization and
    Virtual Environments, 1999.
  • Cleveland, William S., Visualizing Data, Hobart
    Press, 1993.
  • Davidson, R, Harel, D, Drawing Graphs Nicely
    Using Simulated Annealing, ACM Transactions on
    Graphics, 15(4) 301-31 (1996).
  • Fruchterman,TMJ, Reingold, EM, Graph Drawing by
    Force-Directed Placement, Software Practice and
    Experience, 21 1129-64 (1991).
  • Kamada, T,Kawai, S, An Algorithm for Drawing
    General Undirected Graphs, Information Processing
    Letters, 31(1) 7-15, (1989).
Write a Comment
User Comments (0)
About PowerShow.com