Title: Multidimensional Scaling
1Multidimensional Scaling
2Agenda
- Multidimensional Scaling
- Goodness of fit measures
- Nosofsky, 1986
3Proximities
pAmherst, Hadley
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
4Configuration (in 2-D)
xi
5Configuration (in 1-D)
6Formal MDS Definition
- f pij?dij(X)
- MDS is a mapping from proximities to
corresponding distances in MDS space. - After a transformation f, the proximities are
equal to distances in X.
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
7Distances, dij
dAmherst, Hadley(X)
8Distances, dij
9Distances, dij
dAmherst, Hadley(X)4.32
10Proximities and Distances
Proximities
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
Distances
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328
Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600
Hadley 0 12.0350 13.1492 14.1632 8.1892
Leverett 0 12.2097 7.3591 6.6429
Pelham 0 6.3360 15.4250
Shutesbury 0 12.7366
Sunderland 0
11The Role of f
- f relates the proximities to the distances.
- f(pij)dij(X)
12The Role of f
- f can be linear, exponential, etc.
- In psychological data, f is usually assumed any
monotonic function. - That is, if pijltpkl then dij(X)?dkl(X).
- Most psychological data is on an ordinal scale,
e.g., rating scales.
13Looking at Ordinal Relations
Proximities
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
Distances
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328
Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600
Hadley 0 12.0350 13.1492 14.1632 8.1892
Leverett 0 12.2097 7.3591 6.6429
Pelham 0 6.3360 15.4250
Shutesbury 0 12.7366
Sunderland 0
14Stress
- It is not always possible to perfectly satisfy
this mapping. - Stress is a measure of how closely the model
came. - Stress is essentially the scaled sum of squared
error between f(pij) and dij(X)
15Stress
Correct Dimensionality
Stress
Dimensions
16Distance Invariant Transformations
- Scaling (All X doubled in size (or flipped))
- Rotatation (X rotated 20 degrees left)
- Translation (X moved 2 to the right)
17Configuration (in 2-D)
18Rotated Configuration (in 2-D)
19Uses of MDS
- Visually look for structure in data.
- Discover the dimensions that underlie data.
- Psychological model that explains similarity
judgments in terms of distance in MDS space.
20Simple Goodness of Fit Measures
- Sum-of-squared error (SSE)
- Chi-Square
- Proportion of variance accounted for (PVAF)
- R2
- Maximum likelihood (ML)
21Sum of Squared Error
Data Prediction (Data-Prediction)2
7 5.03 3.88
8 6.97 1.06
1 2.12 1.25
8 8.91 0.83
6 6.97 0.94
SSE 7.97
22Chi-Square
Data Prediction (Data-Prediction)2 (Data - Prediction)2/Prediction
7 5 4 0.80
8 7 1 0.14
1 2 1 0.50
8 9 1 0.11
6 7 1 0.14
Chi-Square 1.70
23Proportion of Variance Accounted for
Data Mean Prediction Mean Prediction Mean Prediction Model Prediction Model Prediction Model Prediction
Mean Error Error2 Prediction Error Error2
7 6 1 1 5.03 1.97 3.88
8 6 2 4 6.97 1.03 1.06
1 6 -5 25 2.12 -1.12 1.25
8 6 2 4 8.91 -0.91 0.83
6 6 0 0 6.97 -0.97 0.94
SST 34 SSE 7.96
(SST-SSE)/SST (34-7.96)/34 .77
24R2
Data Mean Prediction Mean Prediction Mean Prediction Model Prediction Model Prediction Model Prediction
Mean Error Error2 Prediction Error Error2
7 6 1 1 5.9 1.1 1.21
8 6 2 4 10.1 -2.1 4.41
1 6 -5 25 4 -3 9
8 6 2 4 5.9 2.1 4.41
6 6 0 0 1 5 25
SST 34 SSE 44.03
(SST-SSE)/SST (34-44.03)/34 -0.295
25Maximum Likelihood
- Assume we are sampling from a population with
probability f(Y ?). - The Y is an observation and the ? are the model
parameters.
?0
Y
N(-1.7 ?0)0.094
26Maximum Likelihood
- With independent observations, Y1Yn, the joint
probability of the sample observations is
?0
Y1
Y2
Y3
0.094 x 0.2661 x .3605 .0090
27Maximum Likelihood
- Expressed as a function of the parameters, we
have the likelihood function - The goal is to maximize L with respect to the
parameters, ?.
28Maximum Likelihood
?0
Y1
Y2
Y3
0.094 x 0.2661 x .3605 .0090
(Assuming ?1)
?-1.0167
Y1
Y2
Y3
0.3159 x 0.3962 x .3398 .0425
29Maximum Likelihood
- Preferred to other methods
- Has very nice mathematical properties.
- Easier to interpret.
- Well see specifics in a few weeks.
- Often harder (or impossible?) to calculate than
other methods. - Often presented as log likelihood, ln(ML).
- Easier to compute (sums, not products).
- Better numerical resolution.
- Sometimes equivalent to other methods.
- E.g., same as SSE when calculating mean of a
distribution.