Multivariate Data Sets - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Multivariate Data Sets

Description:

Data comes in many different forms. Typically, not in the way you want it ... Often, we take raw data and transform it into a form that is more workable. Main idea: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 44
Provided by: johns82
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Data Sets


1
Multivariate Data Sets
  • CS 7450 - Information Visualization
  • Aug. 29, 2000

2
Data Sets
  • Data comes in many different forms
  • Typically, not in the way you want it
  • How is stored (in the raw)?

3
Example
  • Cars
  • make
  • model
  • year
  • miles per gallon
  • cost
  • number of cylinders
  • weights
  • ...

4
Example
  • Web pages

5
Data Tables
  • Often, we take raw data and transform it into a
    form that is more workable
  • Main idea
  • Individual items are called cases
  • Cases have variables (attributes)

6
Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31
Value12 Value22 Value32
Dimensions
Value13 Value23 Value33
Think of as a function f(case1) ltVal11, Val12,gt
7
Example
Mary Jim Sally Mitch
...
SSN Age Hair GPA ...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
People in class
8
Variable Types
  • Three main types of variables
  • N-Nominal (equal or not equal to other values)
  • Example gender
  • O-Ordinal (obeys lt relation, ordered set)
  • Example fr,so,jr,sr
  • Q-Quantitative (can do math on them)
  • Example age

9
Metadata
  • Descriptive information about the data
  • Might be something as simple as the type of a
    variable, or could be more complex
  • For times when the table itself just isnt enough
  • Example if variable1 is l, then variable3 can
    only be 3, 7 or 16

10
How Many Variables?
  • Data sets of dimensions 1,2,3 are common
  • Number of variables per class
  • 1 - Univariate data
  • 2 - Bivariate data
  • 3 - Trivariate data
  • gt3 - Hypervariate data

11
Univariate Data
  • Representations

Bill
7 5 3 1
Tukey box plot
Middle 50
low
high
Mean
0
20
12
Bivariate Data
  • Representations

Scatter plot is common
price
mileage
13
Trivariate Data
  • Representations

3D scatter plot is possible
price
horsepower
mileage
14
Hypervariate Data
  • Number of well-known visualization techniques
    exist for data sets of 1-3 dimensions
  • line graphs, bar graphs, scatter plots OK
  • We see a 3-D world (4-D with time)
  • What about data sets with more than 3 variables?
  • Often the interesting ones

15
Multiple Views
Give each variable its own display
1
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4
2 6 3 1 5
2
3
4
A B C D E
16
Chernoff Faces
Encode different variables values in
characteristics of human face
Cute applet
http//www.cs.uchicago.edu/wiseman/chernoff/
17
Star Plots
Var 1
Space out the n variables at equal angles around
a circle Each spoke encodes a variables value
Var 2
Var 5
Value
Var 3
Var 4
18
Star Plot examples
http//seamonkey.ed.asu.edu/behrens/asu/reports/c
ompre/comp1.html
19
Parallel Coordinates
Encode variables along a horizontal row Vertical
line specifies values
V1 V2 V3 V4 V5
20
Parallel Coords Example
Basic
Grayscale
Color
21
Application
  • System that uses parallel coordinates for
    information analysis and discovery
  • Interactive tool
  • Can focus on certain data items
  • Color

Taken from A. Inselberg, Multidimensional
Detective InfoVis 97, 1997.
22
The Problem
  • VLSI chip manufacture
  • Want high quality chips (high speed) and a high
    yield batch ( of useful chips)
  • Able to track defects
  • Hypothesis No defects gives desired chip types
  • 473 batches of data

23
The Data
  • 16 variables
  • X1 - yield
  • X2 - quality
  • X3-X12 - defects (inverted)
  • X13-X16 - physical parameters

24
Parallel Coordinate Display
yield quality
defects
parameters
Yikes! But not that bad
Distributions x1 - normal x2 - bipolar
25
Top Yield Quality
split
defects
Have some defects
26
Minimal Defects
Not thehighestyields andquality
27
Best Yields
Appears that some defects are necessary to
produce the best chips Non-intuitive!
28
Another Problem
  • Data concerning economic output of a country
    (fishing, mining, etc.)
  • Eight variables
  • Fit a model to the data set
  • Model describes possible economic outputs

29
Parallel Coordinates
Model boundary
Pick a value
Model boundary
30
Another Technique
  • Database of data items, each of n dimensions
  • Issue a query that specifies a target value of
    the dimensions
  • Often get back no exact matches
  • Want to find near matches

Taken from D. Keim, H-P Kriegel, VisDB Database
Exploration Using Multid Vis, IEEE CGA, 1994.
31
Relevance Factor
  • How close an item is to the query
  • Data items have some value that can be
    numerically quantified
  • Each dimension is some distance away
  • from query item
  • Sum these up for total distance
  • Relevance is inverse of distance

32
Example
  • 5 dimensions, integers 0-gt255
  • Query 6, 210, 73, 45, 92
  • Data item 8, 200, 73, 50, 91
  • Distance 2 10 0 5 1 18
  • Relevance 1275 - 18 1267

33
Issues
  • What if dimensions are real numbers or text
    strings?
  • What if theyre the same type, but of different
    orders of magnitude?
  • Have to define some kind of distance, then a
    weight function to multiply by

34
Technique
  • Calculate relevance of all data points
  • Sort items based on relevance
  • Use spiral technique to order the values
  • Color items based on relevance

35
Relevance Colors
Low
High
Empirically established
36
Spiral Method
Highest relevancevalue in center,decreasing
valuesgrow outward
37
Display Methodology
Example five-dimensional data
Same itemappears insame placein each window
Totalrelevance
Dim 1
Dim 2
Spiral in eachwindow
Dim 3
Dim 4
Dim 5
Items ordered by total relevance
38
Figure from Paper
39
Example Display
40
Alternative
  • Grouping arrangement
  • Doesnt use multiple windows
  • Create all relevance dimensional depictions for
    an item and group them
  • Spiral out the different data items depictions

41
Grouping Arrangement
42
Example Display
8 dimensions
1000 items
Grouping
Multi-window
43
Sources Used
CMS book Referenced articles Marti Hearst SIMS
247 lectures C. H. Yu, Visualization Techniques
of Different Dimensions http//seamonkey.ed.asu
.edu/behrens/asu/reports/compre/comp1.html
Write a Comment
User Comments (0)
About PowerShow.com