Visualizing Document Collections - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Visualizing Document Collections

Description:

digital libraries, news archives, web pages. email archives, image galery. Tasks: search ... Create a 'map' of the document collection. Similar documents near ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 30
Provided by: chris1146
Category:

less

Transcript and Presenter's Notes

Title: Visualizing Document Collections


1
Visualizing Document Collections
  • cs5764 Information Visualization
  • Chris North

2
Where are we?
  • Multi-D
  • 1D
  • 2D
  • 3D
  • Trees
  • Graphs
  • Document collections
  • Design Principles
  • Empirical Evaluation
  • Visual Overviews

3
Structured Document Collections
  • Multi-dimensional
  • author, title, date, journal,
  • Trees
  • Dewey decimal
  • Graphs
  • web, citations

4
Envision
  • Ed Fox, et al.
  • Multi-D
  • similar to Spotfire

5
Citation Networks
  • Butterfly Browser
  • Mackinlay et al (PARC)

Butterfly Left refs Right citers Yellow
citers Blue visited 3d plot date, Name,
citers
6
(No Transcript)
7
Unstructured Document Collections
  • Focus on Full Text
  • Examples
  • digital libraries, news archives, web pages
  • email archives, image galery
  • Tasks
  • search
  • Browse
  • Classification, structurization
  • Statistics, keyword usage, languages
  • Subjects, themes, coverage

8
Visualization Strategies
  • Cluster Maps
  • Keyword Query
  • Relationships
  • Reduced representation
  • User controlled layout

9
Cluster Map
  • Create a map of the document collection
  • Similar documents near each other
  • Dissimilar document far apart
  • Grocery store concept

10
Document Vectors
  • Doc1 Doc2 Doc3
  • aardvark 1 2 0
  • banana 2 1 0
  • chris 0 0 3
  • Similarity between pair of docs
  • dot product
  • Layout documents in 2-D map by similarity
  • similar to spring model for graph layout

11
Cluster Algorithms
  • Partition clustering Partition into k subsets
  • Pick k seeds
  • Iteratively attract nearest neighbors
  • Hierarchical clustering Dendrogram
  • Group nearest-neighbor pair
  • Iterate

12
Landscapes
  • Wise et al, Visualizing the non-visual
  • ThemeScapes, Cartia
  • PNNL
  • Mountain height Cluster size

13
Kohonen Maps
  • Xia Lin, Document Space
  • http//faculty.cis.drexel.edu/sitemap/index.html

14
(No Transcript)
15
WebSOM
  • http//websom.hut.fi/websom/

16
Map.net
  • http//maps.map.net/start

17
  • Galaxy of
  • News
  • MIT
  • Cluster map
  • with full text
  • zooming

18
Cluster Map
  • Good
  • Map of collection
  • Major themes and sizes
  • Relationships between themes
  • Scales up
  • Bad
  • Where to locate documents with multiple themes?
  • Both mountains, between mountains, ?
  • Relationships between documents, within
    documents?
  • Algorithm becomes (too) critical

19
Keyword Query
  • Keyword query, Search engine
  • Rank ordered list
  • Information Retrieval
  • Visualization of results

20
Keyword Distributions
  • Hearst, TileBars
  • http//elib.cs.berkeley.edu/tilebars/
  • Keyword distributions within documents

21
Document Distributions
  • Korfhage, VIBE
  • http//www.pitt.edu/korfhage/interfaces.html
  • Documents located between query keywords using
    spring model

22
VR-VIBE
23
Keyword Query
  • Good
  • Reduces the browsing space
  • Map according to users interests
  • Bad
  • What keywords do I use?
  • What about other related documents that dont use
    these keywords?
  • No initial overview
  • Mega-hit, zero-hit problem

24
Relationships
  • Show inter-relationships
  • Matrix or Complete Graph
  • Similarity measure between all pairs of docs
  • Threshold level
  • Salton

25
Variations
  • Docs Paragraphs Themes

26
Relationships
  • Better for smaller, more detailed map
  • Scale up Network visualization
  • Good
  • Can see more complex relationships between/within
    documents
  • Can act like hyperlinks!
  • Bad
  • Finding specific documents
  • Scale up difficult

27
Reduced Visual Representation
  • Bederson, Image browsing

28
User Controlled Layout
  • Card, WebBook and Web Forager
  • http//vtopus.cs.vt.edu/north/infoviz/webbook.mpa

29
Data Mountain
  • Robertson, Data Mountain (Microsoft)
Write a Comment
User Comments (0)
About PowerShow.com