Title: Projective Mapping: A noniteratve method for the Layout of Multidimensional Data
1Projective Mapping A non-iteratve method for the
Layout of Multidimensional Data
- Dr. Karina Assiter
- Wentworth Institute of Technology
- SCI2002
2Outline
- Background
- Projective Mapping
- Experimental Results
- Conclusions
- References
3Background
- Data generation and Data understanding
- Data mining example
- Objective of Layout
- Layout Methods
- MDS Optimization
- Faithfulness in Mapping
- Subjective Layout
4Data generation and data understanding
- Data generation
- Provides Raw material
- Results in large, high-dimensional data sets
- Amount in worlds databases doubles every 20
months - Many attributes for each sample
- Data understanding
- Fuels business growth (Competitive advantage)
- Insights and patterns used to predict the future
- Techniques
- Clustering
- Classifying
- Visualizing (layout)
5Data Mining Example
- Iris Dataset - R.A. Fisher (1930s)
- 50 samples
- Three types of plants (setosa, versicolor,
virginica) - Four attributes (Sepal length/width,Petal
length/width) - Techniques
- Classify new plants (learn rules)
- If petal-length lt 2.45 then Iris-setosa
- Cluster items that fall together
- If we did not know the type of Iris
- Visualize the dataset (layout)
6Objective of Layout
- Create a mapping F from h samples in an
n-dimensional sample set S to representative
points in an m-dimensional representation set R.
7Layout Methods
- Multidimensional scaling (MDS)
- Position points in R so that the distances match
(as closely as possible) the set of
dissimilarities in S. - Self-organizing maps
- Mapping from an input data space S onto a
two-dimensional array of nodes R - Similar samples in S mapped to nearby nodes in R
- Subjective Layout methods
- Samples placement in R depends upon its
relationship with user defined (and positioned)
anchors - Inter-sample relationships may or may not be
preserved.
8Illustration of MDS
9MDS Optimization Methods
- Move points around in R to minimize error
function - Error function
- How accurately proximity values in S are related
to distances in R - Example E2 1 ? (d(Pi , Pj )
d(Qi , Qj )) 2 ? d (Pi ,
Pj ) iltj d(Pi , Pj ) iltj - General characteristics
- Run time complexity of O(h2)
- h is number of samples
- Iterative (continue until stopping condition)
10Faithfulness in Mapping
- A mapping F S -gt R of an x-dimensional
subspace of an n-dimensional space into an
m-dimensional reference space is faithful if the
mapping preserves proximities precisely so that
d(Pi,Pj) d( F(Pi ), F(Pj )), when Pi and Pj are
in S.
11Illustration of Faithfulness
S
R
F
P2
Q2
P1
Q1
P3
P4
Q3
Q4
12Subjective Layout
- A samples mapping in R depends upon its
relationship with user defined (and positioned)
anchors - Inter-object relationships may or may not be
preserved. - Classified as
- Query-relative (traditional)
- Anchors in T are queries (keywords, mailing
lists, etc,) - Point-relative
- Anchors in T are representatives of samples from
S
13Illustration of Subjective Layout
14Projective Mapping
- Overview
- Illustration of placing a sample
- Explain reference set T
- Method Details
- Determine Sense of P in P
15Projective Mapping Overview
- Samples in a dataset S are mapped into a
representation set R based on the centroid of
their geometric relationships to a set T of
pre-positioned references
16Illustration of placing a sample
S
R
Sk
Rk
J
Sj
J
Rj
L1
L1
L2
L2
Qi
P
17Reference Set T
- k samples selected from S
- S1, S2, Sk
- Mapped into R
- Arbitrarily
- User positions
- Computed
- Optimization method that preserves distances
18Method Details
I. Determine plane P onto which we wish to
project w1, w2) II. For each vector, P, in S
A. For each valid pair of references,
Sj and Sk 1. Determine sense (left
or right) of projected P in P (explained in
next slide) 2. In S, find unit vector B
perpendicular to L1 through P 3. In
S, get t1 and t2 from
J Sj (Sk Sj) t1 J
P B t2 4. In R, find unit
vector B perpendicular to L1, depending on
sense B ( Rk1 - Rj1 , Rj0 - Rk0
) Sense was left B ( Rj1
Rk1 , Rk0 Rj0 ) Sense was right
5. Use t1 to determine J in R
J Rj (Rk Rj ) t1
6. Determine scale factor from ratio between
line Sk Sj in S and line Rk - Rj in R
sf ( Rk-Rj / Sk-Sj
7. Calculate Qi based on t2 and B
Qi J (B t2 sf)
B. Average results from each pair to get
position Q.
19Determine Sense of P in P
B-coord of Sk (-2,1)
B-coord of Sj (2,-1)
B-coord of P (-1,1)
W2 (0,1,0)
w1 (1,0,0)
Plane P to project onto (1,0,0), (0,1,0)
Equation of line between B-coord Sj and
B-coord Sk of 2x 4y 0 So
2(-1) 4(1) gt 0 left side of line
20Experimental Results
- Example I
- Embedded two-dimensional data
- Example II
- Lines to Lines
- Example III
- Rotation
- Example IV
- Identity mapping
- Example V (Optional)
- Cube series
21Example I Embedded two-dimensional data
z
S
Samples in S (triangles) x y z P0
.5 0 0 P1 0 0 .5 P2 .5 0
-.5 P3 1 0 0 P4 0 0 -2 P5 -1
0 0
S1 at (0,0,1)
S2 at (-1,0,0)
S0 at (1,0,0)
x
x
y
S3 at (0,0,-1)
R
22Example II Lines to lines
23Example III Rotation
24Example IV Identity Mapping
25Example V Cube series (Optional)
Front face, horizontal plane
Front face T, Plane between horizontal and 45
Front face T, Plane between 45 and vertical
Front face T, Plane is vertical
Diagonal T, vertical plane
Diagonal T, horizontal plane
Diagonal T, Planes between horizontal and 45
Diagonal T, Planes between 45 and vertical
26Run-time Complexity
- Inserting one sample
- Depends on k
- The number of references in T
- Upper bound of O(k2)
- k! All ways to select k items 2
ways - (k-2)! 2!
- Inserting h samples
- Upper bound of O(k2 h).
27Comparing Methods
Hypercube samples 00000-11111.
Projective mapping References in S
00000-00011 Associated references in R
00-11. The plane for the projection was
00001,00010.
Projective Mapping on the Hypercube
Sammon Mapping on the Hypercube
Iris dataset - 150 samples - three clusters of
flowers.
Sammon Mapping on the Iris dataset
Projective Mapping on the Iris dataset
28Conclusions
- Benefits of PM
- Drawbacks of PM
- Uses for Projective mapping
- Future Work
29Benefits of PM
- Fast O (k2h)
- Non-iterative
- Guaranteed upper-bound
- Adaptive
- Point-relative Subjective
- In one and two-dimensions
- Faithful mapping of references applied to samples
- Linear transformation of references applied to
samples - Structure preserving (with distortion)
- Works with fallible and sparse datasets
- Creates consistent layouts
- Not domain specific
30Drawbacks of PM
- With high-dimensional data
- Distortion occurs
- Plane selection is hard to optimize
- Faithfulness requires Euclidean distances
31Uses for Projective Mapping
- Visualization to discover
- Data Dimensionality
- Data Structure
- Relationship between a subset of references and
all other samples
32Future work
- Generalize for
- N-dimensional to n-dimensional mapping
- Vary for testing
- Plane selection
- Data dimensionality
- Reference set
- Selection
- Size
- Visualize as slide show
- multiple-views of dataset
33Selected References
- Assiter, K.A. (2001) Projective Mapping A
non-iterative method for the layout of
multidimensional data. Dissertation. Tufts
University, Medford, MA, 2001. - Chalmers, M. and P. Chitson (1992). Bead
Explorations in Information Visualization. SIGIR
'92 Proceedings of the Fifteenth annual
International ACM SIGIR Conference on Research
and Development in Information Retrieval,
Denmark, ACM Press. - Cox, T. F. and M. A. Cox (1994). Multidimensional
Scaling. Monographs on Statistics and Applied
Probability. London, Chapman Hall - Fairchild, K. M., S. E. Poltrock and G. W. Furnas
(1988). SemNet Three-Dimensional Graphic
Representations of Large Knowledge Bases.
Cognitive Science and its Applications for
Human-Computer Interaction. R. Guindon.
Hillsdale, New Jersey, Lawrence Erlbaum
Associates 201-233. - Kruskal, J. B. (1964a). Multidimensional Scaling
by optimizing goodness of fit to a non-metric
hypothesis. Psychometrika 29(1) 1-27. Reprinted
in Key Texts in Multidimensional Scaling, P.M.
Davies and A.P.M Coxon, Eds. Heinemann
Educational Books, Exeter, N.H.., 1982, pp 59-83. - Kruskal, J. B. (1964b). Non-metric
multidimensional Scaling A numerical method.
Psychometrika 29(2) 115--129, Reprinted in Key
Texts in Multidimensional Scaling, P.M. Davies
and A.P.M Coxon, Eds. Heinemann Educational
Books, Exeter, N.H.., 1982, pp 59-83. - Olsen, K. A. (1993). Visualization of a document
Collection The Vibe System. Information
processing and management 29(1) 69-81, Pergamon
Press Ltd, 1993. - Sammon, J. W. (1969). A Nonlinear mapping for
Data Structure Analysis. IEEE Transactions on
Computers 18(5) 401-409, May 1969.
34End
35Multidimensional Scaling
- Metric
- Preserve actual proximities
- Types
- Classical (PVA)
- Least squares
- Optimization method with global error function
- Non-metric
- Preserve rank order of proximities
- Optimization method with global error function
- Spring model methods
- Attractive and repulsive forces between objects
act to either bring them together or push them
apart - Optimization with local error function
36Notation
S N-dimensional sample set Sj, Sk
Projection pair of references in S J Point
where P is projected perpendicularly to L2 L1
Line that goes through Sj and Sk L2 Line
that goes through J and P P Sample in S to
be mapped t1 Placement of J along L1
(time parameter in linear equation) t2
Distance between J and P (not a time
parameter) R Two-dimensional
representation set Rj, Rk Projection
pair of references in R J Intersection
point between L1 and L2 L1 Line that
goes through Rj and Rk L2 Line that goes
through J and Qi Qi Result of Two point
n-dimensional projective mapping