Title: Multivariate Data Sets
1Multivariate Data Sets
- CS 7450 - Information Visualization
- Aug. 29, 2000
2Data Sets
- Data comes in many different forms
- Typically, not in the way you want it
- How is stored (in the raw)?
3Example
- Cars
- make
- model
- year
- miles per gallon
- cost
- number of cylinders
- weights
- ...
4Example
5Data Tables
- Often, we take raw data and transform it into a
form that is more workable - Main idea
- Individual items are called cases
- Cases have variables (attributes)
6Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31
Value12 Value22 Value32
Dimensions
Value13 Value23 Value33
Think of as a function f(case1) ltVal11, Val12,gt
7Example
Mary Jim Sally Mitch
...
SSN Age Hair GPA ...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
People in class
8Variable Types
- Three main types of variables
- N-Nominal (equal or not equal to other values)
- Example gender
- O-Ordinal (obeys lt relation, ordered set)
- Example fr,so,jr,sr
- Q-Quantitative (can do math on them)
- Example age
9Metadata
- Descriptive information about the data
- Might be something as simple as the type of a
variable, or could be more complex - For times when the table itself just isnt enough
- Example if variable1 is l, then variable3 can
only be 3, 7 or 16
10How Many Variables?
- Data sets of dimensions 1,2,3 are common
- Number of variables per class
- 1 - Univariate data
- 2 - Bivariate data
- 3 - Trivariate data
- gt3 - Hypervariate data
11Univariate Data
Bill
7 5 3 1
Tukey box plot
Middle 50
low
high
Mean
0
20
12Bivariate Data
Scatter plot is common
price
mileage
13Trivariate Data
3D scatter plot is possible
price
horsepower
mileage
14Hypervariate Data
- Number of well-known visualization techniques
exist for data sets of 1-3 dimensions - line graphs, bar graphs, scatter plots OK
- We see a 3-D world (4-D with time)
- What about data sets with more than 3 variables?
- Often the interesting ones
15Multiple Views
Give each variable its own display
1
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4
2 6 3 1 5
2
3
4
A B C D E
16Chernoff Faces
Encode different variables values in
characteristics of human face
Cute applet
http//www.cs.uchicago.edu/wiseman/chernoff/
17Star Plots
Var 1
Space out the n variables at equal angles around
a circle Each spoke encodes a variables value
Var 2
Var 5
Value
Var 3
Var 4
18Star Plot examples
http//seamonkey.ed.asu.edu/behrens/asu/reports/c
ompre/comp1.html
19Parallel Coordinates
Encode variables along a horizontal row Vertical
line specifies values
V1 V2 V3 V4 V5
20Parallel Coords Example
Basic
Grayscale
Color
21Application
- System that uses parallel coordinates for
information analysis and discovery - Interactive tool
- Can focus on certain data items
- Color
Taken from A. Inselberg, Multidimensional
Detective InfoVis 97, 1997.
22The Problem
- VLSI chip manufacture
- Want high quality chips (high speed) and a high
yield batch ( of useful chips) - Able to track defects
- Hypothesis No defects gives desired chip types
- 473 batches of data
23The Data
- 16 variables
- X1 - yield
- X2 - quality
- X3-X12 - defects (inverted)
- X13-X16 - physical parameters
24Parallel Coordinate Display
yield quality
defects
parameters
Yikes! But not that bad
Distributions x1 - normal x2 - bipolar
25Top Yield Quality
split
defects
Have some defects
26Minimal Defects
Not thehighestyields andquality
27Best Yields
Appears that some defects are necessary to
produce the best chips Non-intuitive!
28Another Problem
- Data concerning economic output of a country
(fishing, mining, etc.) - Eight variables
- Fit a model to the data set
- Model describes possible economic outputs
29Parallel Coordinates
Model boundary
Pick a value
Model boundary
30Another Technique
- Database of data items, each of n dimensions
- Issue a query that specifies a target value of
the dimensions - Often get back no exact matches
- Want to find near matches
Taken from D. Keim, H-P Kriegel, VisDB Database
Exploration Using Multid Vis, IEEE CGA, 1994.
31Relevance Factor
- How close an item is to the query
- Data items have some value that can be
numerically quantified - Each dimension is some distance away
- from query item
- Sum these up for total distance
- Relevance is inverse of distance
32Example
- 5 dimensions, integers 0-gt255
- Query 6, 210, 73, 45, 92
- Data item 8, 200, 73, 50, 91
- Distance 2 10 0 5 1 18
- Relevance 1275 - 18 1267
33Issues
- What if dimensions are real numbers or text
strings? - What if theyre the same type, but of different
orders of magnitude? - Have to define some kind of distance, then a
weight function to multiply by
34Technique
- Calculate relevance of all data points
- Sort items based on relevance
- Use spiral technique to order the values
- Color items based on relevance
35Relevance Colors
Low
High
Empirically established
36Spiral Method
Highest relevancevalue in center,decreasing
valuesgrow outward
37Display Methodology
Example five-dimensional data
Same itemappears insame placein each window
Totalrelevance
Dim 1
Dim 2
Spiral in eachwindow
Dim 3
Dim 4
Dim 5
Items ordered by total relevance
38Figure from Paper
39Example Display
40Alternative
- Grouping arrangement
- Doesnt use multiple windows
- Create all relevance dimensional depictions for
an item and group them - Spiral out the different data items depictions
41Grouping Arrangement
42Example Display
8 dimensions
1000 items
Grouping
Multi-window
43Sources Used
CMS book Referenced articles Marti Hearst SIMS
247 lectures C. H. Yu, Visualization Techniques
of Different Dimensions http//seamonkey.ed.asu
.edu/behrens/asu/reports/compre/comp1.html