Title: Worcester Polytechnic Institute
1Worcester Polytechnic Institute
XmdvTool Interactive Visual Data Exploration
System for High-dimensional Data Sets
http//davis.wpi.edu/xmdv
Matthew O. Ward, Elke A. Rundensteiner, Jing
Yang, Punit Doshi, Geraldine Rosario, Allen R.
Martin, Ying-Huey Fua, Daniel Stroe
This work partially funded by NSF Grants
IIS-9732897, IRIS-9729878 and IIS-0119276
2XmdvTool Features
- Hierarchical visualization and interaction tools
for exploring very large high-dimensional data
sets to discover patterns, trends and outliers - Applications
- Bioterrorism Detection
- Bioinformatics and Drug Discovery
- Space Science
- Geology and Geochemistry
- Systems Monitoring and Performance Evaluation
- Economics and Business
- Simulation Design and Analysis
- Multi-platform support (Unix, Linux, Windows)
- Public domain software http//davis.wpi.edu/xmdv
3Xmdv Main Features
- Scale-up to High Dimensions Visual Hierarchical
Dimension Reduction - Scale-up to Large Data Sets Interactive
Hierarchical Displays, Database Backend with
Minmax Encoding, Semantic Caching and Adaptive
Prefetching - Interlinked Multi-Displays Parallel
Coordinates, Glyphs, Scatterplot Matrices,
Dimensional Stacking - Visual Interaction Tools N-Dimensional Brushes,
Structure-Based Brushing, InterRing
4Scale-Up for Large Number of Dimensions
- Solution to High Dimensional Datasets
- Group Similar Dimensions into Dimension Hierarchy
- Navigate Dimension Hierarchy by InterRing
- Form Lower Dimensional Spaces by Dimension
Clusters - Convey Dimension Cluster Information by
Dissimilarity Display
5Visual Hierarchical Dimension Reduction Process
6Visual Hierarchical Dimension Reduction Process
A 42-dimensional Data Set
A 4-Dimensional Subspace
Dimension Hierarchy Interaction Tool InterRing
7InterRing - Dimension Hierarchy Navigation and
Manipulation
Roll-up/Drill-down
Rotate Zoom in/out
Modify
Distort
8Dissimilarity Display
Three Axes Method
Diagonal Plot Method
Mean-Band Method
Axis Width Method
9Scale-up for Large Number of Records
- Solution to Large Scale Datasets
- Group Similar Records into Data
Hierarchy - Navigate Data Hierarchy by Structure-Based
Brushing - Represent Data Clusters by Mean-Band
Method - Provide Database Backend Support using MinMax
Tree, Caching, Prefetching
10Interactive Hierarchical Display
Hierarchical Clustering
Structure-Based Brushing
11Interactive Hierarchical Display
Flat Display
Hierarchical Display
Mean-Band Method in Parallel Coordinates
12Interactive Hierarchical Display
Flat Display
Hierarchical Display
Mean-Band Method in Parallel Coordinates
13Scalability of Data Access
- Approach
- Attach database system to visualization front-end
- MinMax hierarchy encoding
- Key idea avoid recursive processing
- Pre-computed
- Caching
- Key idea reduce response time and network
traffic - Prefetching
- Key idea use application hints and predict user
patterns - Performed during idle time
14Scalability of Data AccessMinMax Hierarchy
Encoding
- Pre-compute object positions
- level-of-detail (L)
- extent values (x,y)
- preserve tree structure
- New query semantics
- objects are now rectangles
- select objects that touch L
- select objects that touch (x, y)
- structure-based brush intersection of two
selections
15Scalability of Data Access Caching
- Purpose
- reduce response time and network traffic
- Issues
- visual query cannot directly translate into
object IDs - high-level cache specification to avoid complete
scans - Semantic caching
- queries are cached rather than objects
- minimize cost of cache lookup
- dynamically adapt cached queries to patterns of
queries
16Scalability of Data Access Prefetching
- Strategy
- Speculative (no specific hints)
- navigation remains local
- both user and data set influence exploration
- Adaptive (strategy changes over time)
- Evolves as more knowledge becomes available
- Non-pure (interruptible prefetching)
- leave buffer in consistent state
- Requirements
- non-pure prefetching large transactions small
object size semantic caching ? small
granularity (object level) - speculative, non-pure prefetcher ? cache
replacement policy guessing method
17Scalability of Data Access Experimental
Evaluation
- Conclusions
- Caching reduces response time by 80
- Prefetching further reduces response time by 30
- Designing better prefetching strategies might
help further reduce response time
18Scalability of Data Access Prefetching
Mean Strategy
Random Strategy
Direction Strategy
Localized Speculative Strategies
Exponential Weight Average Strategy
Focus Strategy
Vector Strategies
Data Set Driven Strategy
19Xmdv System Implementation
- Tools
- C/C
- TCL/TK
- OpenGL
- Oracle 8i
- ProC
20Publications (available at http//davis.wpi.edu/x
mdv)
- Jing Yang, Matthew O. Ward and Elke A.
Rundensteiner, "InterRing An Interactive Tool
for Visually Navigating and Manipulating
Hierarchical Structures", InfoVis 2002, to appear - Punit R. Doshi, Elke A. Rundensteiner, Matthew O.
Ward and Daniel Stroe, Prefetching For Visual
Data Exploration. - Technical Report WPI-CS-TR-02-07, 2002
- Jing Yang, Matthew O. Ward and Elke A.
Rundensteiner, Interactive Hierarchical
Displays A General Framework for Visualization
and Exploration of Large Multivariate Data Sets,
Computers and Graphics Journal, 2002, to appear - Daniel Stroe, Elke A. Rundensteiner and Matthew
O. Ward, Scalable Visual Hierarchy Exploration,
Database and Expert Systems Applications, pages
784-793, Sept. 2000 - Ying-Huey Fua, Matthew O. Ward and Elke A.
Rundensteiner, Hierarchical Parallel Coordinates
for Exploration of LargeDatasets, IEEE Proc. of
Visualization, pages 43-50, Oct. 1999 - Ying-Huey Fua, Matthew O. Ward and Elke A.
Rundensteiner, Navigating Hierarchies with
Structure-Based Brushes, IEEE Proceedings of
Visualization, pages 43-50, Oct. 1999