Title: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data
1A Comparison of Graphical Techniques for the
Display of Co-Occurrence Data
- Jan W. Buzydlowski, Xia Lin, Howard D. White
- College of Information Science and Technology
- Drexel University
- Philadelphia, PA 19104
- USA
2Information Visualization
- (Data) Visualization allows for the revelation of
intricate structure which cannot be absorbed in
any other way. Cleveland, 1993 - (Information) Visualization has two aspects,
structural modeling and graphic
representation.C. Chen, 1999 - data - model - display
3Visualization Overview
- Model - Display
- Co-Occurrence Model
- 3 Graphical Displays
- Data
- Co-citation counts from the Institute for
Scientific Information, Philadelphia, PA - Obtained from a 10-year Arts Humanities
Citation Index database given Drexel by ISI for
research purposes
4Co-Occurrence Model
- Examples
- Derivation
- Metrics
5Co-Occurrence Data - Example 1
- Market Basket Analysis
- a shopping cart holds items purchased
- e.g., milk, bread, razor blades, newspaper
- Over all the sales for one day
- what items are purchased together
- how can we arrange the items in the store
- Pampers and beer on Thursdays...
6Co-Occurrence Data - Example 2
- Author Co-citation Analysis (ACA)
- Bibliographic data on a given article holds,
e.g., - title, keywords, abstract, citations to other
documents - An article might cite, e.g.
- Plato, Aristotle, Smith, Brown
- Over a given set of many citing articles
- Count how many times each pair of authors were
cited together - Resulting co-citation count shows common
intellectual interest
7Co-Occurrence Derivation
- For a given data set (N 4 unique terms)
- Article 1 Plato, Aristotle, Smith
- Article 2 Plato, Smith
- Article 3 Plato, Aristotle, Smith, Brown
- The following co-citations (C(4,2) 6) are found
- COMBINATION COUNT ARTICLES
- Plato and Smith 3 1, 2, 3
- Plato and Aristotle 2 1, 3
- Plato and Brown 1 3
- Aristotle and Smith 2 1, 3
- Aristotle and Brown 1 3
- Smith and Brown 1 3
8Co-Occurrence Measures
- Raw counts
- Additional information
- Correlations
- Replace each cell by correlation measure of each
pair-wise column - Conditional Probability
- Compute each cell by dividing each unique
combination by total occurring
9Co-Occurrence Structure -Example
10Graphical Techniques
- Three Methodologies
- Multi-dimensional scaling
- Self-organizing maps
- Pathfinder networks
11MDS
12(No Transcript)
13(No Transcript)
14MDS Methodology
- Given original distances (similarities) estimate
coordinates that could give those distances - The computed distances should correspond to the
original distances - Stress
- Added dimensions
15SOM
16Self-Organizing Maps (SOMs)
- Also known as Kohonen Maps
- Based on Neural Networks
- Related to wetware
- robust techniques
- If categories are known
- supervised technique
- backproprogating learning
- If categories are sought
- unsupervised technique
- competitive learning
17SOMs
- Given a 2-D grid of nodes
- each node has N weights
- each vector (row) has N terms
- map each input vector to a node
- Similar to vector quantization (VQ)
18SOMs Generation
- nodes initially given random weights
- randomly sample an input vector
- row of co-occurrence matrix
- with replacement
- find a node closest to vector
- Euclidean distance
- update node weights
- node weight node weight gain term distance
- update neighborhood
- cool gain term and neighborhood
- repeat
19PF Nets
20Pathfinder Networks
- Uses on graph notation
- nodes authors
- edges co-citation counts
- Co-occurrence is a complete network (weighted,
undirected)
Plato
3
Smith
2
2
Aristotle
21Pathfinder Networks Generation
- Pathfinder Network is generated by varying the
parameters - distance (r)
- triangle inequality (q)
22Pathfinder Distance
- Uses Minkowski metric
- d (? eir )1/r
- Example
- e1 3, e2 4
- r 1 gt d 7 3 4
- Driving distance / ratio data
- r 2 gt d 5 (9 16)1/2
- Euclidean Distance
- r (approaches) infinity gt d 4 max( 3, 4)
- ordinal data
- rank rather than value
23Pathfinder Triangle Inequality
- A required property of a metric definition
- d(i,j) lt d(i,k) d(k,j)
- But may not be justified
- in personal judgments
- If a is similar to b, and b is similar to c,
there may be no transitive judgment of similarity
from a to c - in set intersections
- Even though Smith and Jones appear 12 times, and
Jones and Brown appear 5 times, the overlap
between Smith and Brown cannot be predicted
24Pathfinder Triangle Inequality
- Defines q-triangular
- check paths of length q to determine if
inequality is met - minimum is 2
- maximum is n -1
- full compliance
- the longer the length, the fewer the connections
25Pathfinder Example
26Pathfinder Network Creation
- PFNet (r, q)
- Examine all paths of length q or less.
- Use Minkowski Metric with parameter r to compute
path length. - If a path of less weight is found, then remove
the edge.
27Pathfinder - Example
Smith
Jones
5
q 2
4
3
Brown
r 1 gt Smith - Jones is kept
r 2 gt Smith - Jones is kept
r infinity gt Smith - Jones is removed
28Comparison of Techniques
- MDS
- Reduces dimensions / reveals clusters
- 2D may be insufficient
- measurement may not be Euclidean
- SOM
- robust
- no guarantee of convergence/unique solution
- Pathfinder
- does not assume ratio data/triangle inequality
- connections rather than position is important
- additional methodology needed for display
29Comparison of Techniques
- Similarities
- Spatial models
- Differences
- use of visual space
- semantic meaning
- as related to data
- research in progress
30Graphical Display of Methodologies
- MDS
- assume that 2 dimensions are sufficient
- x, y for each point already defined
- SOM
- grid defines the 2D surface
- plot each label with the appropriate node
- Pathfinder
- only defines the nodes and links
- need additional methodologies
- Spring-embedder models
- Kamada and Kawai (1989)
- Fruchterman and Reingold (1991)
- Davidson and Harel (1996)
31Graphical Comparison of Three Methods
- Data
- Institute for Scientific Information
- Arts and Humanities Database (AHCI)
- 1988 - 1997
- 1.26 million records
- Example
- Given Plato, find related authors
- Interface described in IV 2000 Paper
- CSNA 2000 Paper
- (Lin, Buzydlowski, White)
3225 Authors Co-cited with Plato
- PLATO (4928)
- ARISTOTLE (1861)
- PLUTARCH (838)
- CICERO (699)
- HOMER (627)
- BIBLE (552)
- EURIPIDES (515)
- ARISTOPHANES (474)
- XENOPHON (459)
- AUGUSTINE (432)
- HERODOTUS (425)
- KANT-I (385)
- AESCHYLUS (374)
- SOPHOCLES (363)
- THUCYDIDES (363)
- OVID (334)
- HESIOD (325)
- DIOGENES-LAERTIUS (317)
- HEIDEGGER-M (312)
- DERRIDA-J (304)
- PINDAR (292)
- NIETZSCHE-F (278)
- HEGEL-GWF (264)
- VERGIL (259)
- AQUINAS-T (255)
33300 Pair-wise co-citations
- 1 PLATO AND ARISTOTLE -1940 docs
- 2 PLATO AND PLUTARCH - 872 docs
- .
- .
- .
- 300 VERGIL AND AQUINAS-T - 38 docs
34Visualization allows for the revelation of
intricate structure which cannot be absorbed in
any other way...
352D MDS map of 25 authors co-cited with Plato
36(No Transcript)
37PFNet of 25 authors co-cited with Plato
AESCHYLUS
SOPHOCLES
EURIPIDES
HESIOD
AUGUSTINE
HOMER
PINDAR
BIBLE
ARISTOPHANES
PLATO
DIOGENES-LAERTIUS
ARISTOTLE
XENOPHON
KANT-I
CICERO
AQUINAS-T
PLUTARCH
HEIDEGGER-M
THUCYDIDES
DERRIDA-J
HEGEL-GWF
HERODOTUS
OVID
NIETZSCHE-F
VERGIL
38Conclusion
- Slides available at
- faculty.cis.drexel.edu/jbuzydlo/
- janb_at_drexel.edu
39Bibliography
- Chen, Chaomei, Information Visualization and
Virtual Environments, 1999. - Cleveland, William S., Visualizing Data, Hobart
Press, 1993. - Davidson, R, Harel, D, Drawing Graphs Nicely
Using Simulated Annealing, ACM Transactions on
Graphics, 15(4) 301-31 (1996). - Fruchterman,TMJ, Reingold, EM, Graph Drawing by
Force-Directed Placement, Software Practice and
Experience, 21 1129-64 (1991). - Kamada, T,Kawai, S, An Algorithm for Drawing
General Undirected Graphs, Information Processing
Letters, 31(1) 7-15, (1989).