Title: Multivariate Data Sets
1Multivariate Data Sets
- CS 7450 - Information Visualization
- Jan. 15, 2002
- John Stasko
2Data Sets
- Data comes in many different forms
- Typically, not in the way you want it
- How is stored (in the raw)?
3Example
- Cars
- make
- model
- year
- miles per gallon
- cost
- number of cylinders
- weights
- ...
4Example
5Data Tables
- Often, we take raw data and transform it into a
form that is more workable - Main idea
- Individual items are called cases
- Cases have variables (attributes)
6Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31
Value12 Value22 Value32
Dimensions
Value13 Value23 Value33
Think of as a function f(case1) ltVal11, Val12,gt
7Example
Mary Jim Sally Mitch
...
SSN Age Hair GPA ...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
People in class
8Example
Baseballstatistics
9Variable Types
- Three main types of variables
- N-Nominal (equal or not equal to other values)
- Example gender
- O-Ordinal (obeys lt relation, ordered set)
- Example fr,so,jr,sr
- Q-Quantitative (can do math on them)
- Example age
10Metadata
- Descriptive information about the data
- Might be something as simple as the type of a
variable, or could be more complex - For times when the table itself just isnt enough
- Example if variable1 is l, then variable3 can
only be 3, 7 or 16
11How Many Variables?
- Data sets of dimensions 1,2,3 are common
- Number of variables per class
- 1 - Univariate data
- 2 - Bivariate data
- 3 - Trivariate data
- gt3 - Hypervariate data
12Univariate Data
Bill
7 5 3 1
Tukey box plot
Middle 50
low
high
Mean
0
20
13Bivariate Data
Scatter plot is common
price
mileage
14Trivariate Data
3D scatter plot is possible
price
horsepower
mileage
15Hypervariate Data
- Number of well-known visualization techniques
exist for data sets of 1-3 dimensions - line graphs, bar graphs, scatter plots OK
- We see a 3-D world (4-D with time)
- What about data sets with more than 3 variables?
- Often the interesting ones
16Multiple Views
Give each variable its own display
1
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4
2 6 3 1 5
2
3
4
A B C D E
17Scatterplot Matrix
Represent each possible pair of variables in
their own 2-D scatterplot Useful for
what? Misses what?
18Chernoff Faces
Encode different variables values in
characteristics of human face
http//www.cs.uchicago.edu/wiseman/chernoff/ http
//hesketh.com/schampeo/projects/Faces/chernoff.ht
ml
Cute applets
19Star Plots
Var 1
Space out the n variables at equal angles around
a circle Each spoke encodes a variables value
Var 2
Var 5
Value
Var 3
Var 4
20Star Plot examples
http//seamonkey.ed.asu.edu/behrens/asu/reports/c
ompre/comp1.html
21Star Coordinates
E. Kandogan, Star Coordinates A
Multi-dimensional Visualization Technique with
Uniform Treatment of Dimensions, InfoVis
2000 Late-Breaking Hot Topics, Oct. 2000
Demo
22Intermission
- Missing students
- Learn names
- Computer accounts
23Parallel Coordinates
Encode variables along a horizontal row Vertical
line specifies values
V1 V2 V3 V4 V5
24Parallel Coords Example
Basic
Grayscale
Color
25Application
- System that uses parallel coordinates for
information analysis and discovery - Interactive tool
- Can focus on certain data items
- Color
Taken from A. Inselberg, Multidimensional
Detective InfoVis 97, 1997.
26The Problem
- VLSI chip manufacture
- Want high quality chips (high speed) and a high
yield batch ( of useful chips) - Able to track defects
- Hypothesis No defects gives desired chip types
- 473 batches of data
27The Data
- 16 variables
- X1 - yield
- X2 - quality
- X3-X12 - defects (inverted)
- X13-X16 - physical parameters
28Parallel Coordinate Display
yield quality
defects
parameters
Yikes! But not that bad
Distributions x1 - normal x2 - bipolar
29Top Yield Quality
split
defects
Have some defects
30Minimal Defects
Not thehighestyields andquality
31Best Yields
Appears that some defects are necessary to
produce the best chips Non-intuitive!
32Another Problem
- Data concerning economic output of a country
(fishing, mining, etc.) - Eight variables
- Fit a model to the data set
- Model describes possible economic outputs
33Parallel Coordinates
Model boundary
Pick a value
Model boundary
34Xmdv
Toolsuite created by Matthew Ward of
WPI Includes parallel coordinate views
Demo
35Dimensional Anchors
Attempt to unify many different multi-var
vis techniques Uses 9 DA parameters
P. Hoffman, G. Grinstein, D. Pinkney, Dimensional
Anchors A Graphic Primitive for
Multidimensional Multivariate Information
Visualizations, Workshop on New Paradigms in
Info Vis, Nov. 1999.
One example display
36Another Technique
- Database of data items, each of n dimensions
- Issue a query that specifies a target value of
the dimensions - Often get back no exact matches
- Want to find near matches
Taken from D. Keim, H-P Kriegel, VisDB Database
Exploration Using Multid Vis, IEEE CGA, 1994.
37Relevance Factor
- How close an item is to the query
- Data items have some value that can be
numerically quantified - Each dimension is some distance away
- from query item
- Sum these up for total distance
- Relevance is inverse of distance
38Example
- 5 dimensions, integers 0-gt255
- Query 6, 210, 73, 45, 92
- Data item 8, 200, 73, 50, 91
- Distance 2 10 0 5 1 18
- Relevance 1275 - 18 1267
39Issues
- What if dimensions are real numbers or text
strings? - What if theyre the same type, but of different
orders of magnitude? - Have to define some kind of distance, then a
weight function to multiply by
40Technique
- Calculate relevance of all data points
- Sort items based on relevance
- Use spiral technique to order the values
- Color items based on relevance
41Relevance Colors
Low
High
Empirically established
42Spiral Method
Highest relevancevalue in center,decreasing
valuesgrow outward
43Display Methodology
Example five-dimensional data
Same itemappears insame placein each window
Totalrelevance
Dim 1
Dim 2
Spiral in eachwindow
Dim 3
Dim 4
Dim 5
Items ordered by total relevance
44Figure from Paper
45Example Display
46Alternative
- Grouping arrangement
- Doesnt use multiple windows
- Create all relevance dimensional depictions for
an item and group them - Spiral out the different data items depictions
47Grouping Arrangement
48Example Display
8 dimensions
1000 items
Grouping
Multi-window
49Sources Used
CMS book Referenced articles Marti Hearst SIMS
247 lectures C. H. Yu, Visualization Techniques
of Different Dimensions http//seamonkey.ed.asu
.edu/behrens/asu/reports/compre/comp1.html
50Upcoming
- Cognitive Tasks and Issues
- Multivariate vis tools