Title: Lecture 20 Empirical Orthogonal Functions and Factor Analysis
1Lecture 20Empirical Orthogonal
FunctionsandFactor Analysis
2Motivationin Fourier Analysis the choice of
sine and cosine patterns was prescribed by the
method.Could we use the data itself as a source
of information about the shape of the patterns?
3Examplemaps of some hypothetical function,say,
sea surface temperatureforming a sequence in
time
4the data
time
time
5the data
6pattern number
pattern importance
7pattern number
Choose just the most important patterns
pattern importance
3
83 most important patterns
9comparison
original
reconstruction using only 3 patterns
Note that this process has reduced the
noise(since noise has no pattern common to all
the images)
10amplitudes of patterns
time
11amplitudes of patterns
time
Note no requirement that pattern is periodic in
time
12Discussionmixing of end members
13ternary diagram
Useful tool for data that has three components
C
A
B
14works for 3 end-members, as long as ABC100
C
0 A
25 A
50 A
75 A
100 A
B
similarly for B and C
15Suppose data fall near line on diagram
C
data
A
B
16Suppose data fall near line on diagram
C
end-members or factors
f1
f2
A
B
17Suppose data fall near line on diagram
C
end-members or factors
f1
f2
A
B
18Suppose data fall near line on diagram
C
end-members or factors
f1
mixing line
f2
A
B
19data idealize as being on mixing line
20You could represent the data exactly with a third
noise factor
C
doesnt much matter where you put f3, as long as
its not on the line
f1
f2
f3
A
B
21S components (A, B, C, ) in each sample, s
- (A in s1) (B in s1) (C in s1)
- (A in s2) (B in s2) (C in s2)
- (A in s3) (B in s3) (C in s3)
-
- (A in sN) (B in sN) (C in sN)
S
N samplesM componentsS is N?M
Note a sample is along a row in S
22F components (A, B, C, ) in each factor, f
- (A in f1) (B in f1) (C in f1)
- (A in f2) (B in f2) (C in f2)
- (A in f3) (B in f3) (C in f3)
F
M componentsM factorsF is M?M
23C coefficients of the factors
- (f1 in s1) (f2 in s1) (f3 in s1)
- (f1 in s2) (f2 in s2) (f3 in s2)
- (f1 in s3) (f2 in s3) (f3 in s3)
-
- (f1 in sN) (f2 in sN) (f3 in sN)
C
N samplesM factorsC is N?M
24SamplesN?M
S C F
(f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2)
(f2 in s2) (f3 in s2) (f1 in s3) (f2 in
s3) (f3 in s3) (f1 in sN) (f2 in sN)
(f3 in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)
Factors M?M
Coefficients N?M
25SamplesN?M
data approximated with only most important
factorsp most important factors those with
the biggest coefficients
S ? C F
(f1 in s1) (f2 in s1) (f1 in s2) (f2 in
s2) (f1 in s3) (f2 in s3) (f1 in sN) (f2
in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2)
ignore f3
ignore f3
selectedcoefficients N?p
selectedfactors p?M
26view samples as vectors in space
B
s1
s2
s3
Let the factors be unit vectors
f
C
A
then the coefficients are the projections (dot
products) of the sample onto the factors
27Suggests a method of choosing factors so that
they have large coefficients
Find the factor f that maximizesE Si si ? f
2with the constraint that f ? f 1Note
square the dot product since it can be negative
28Find the factor f that maximizesE Si si ? f
2 with the constraint that L f ? f 1
0E Si si ? f 2 Si Sj Sij fj Sk Sik
fk Sj Sk Si Sij Sik fj fk Sj Sk Mjk fj
fk with Mjk Si Sij Sik or MSTSL Si fi2
1Use Lagrange Multipliers, minimizing FE-l2L,
where l2 is the Lagrange Multiplier. We solved
this problem 2 lectures ago. Its solution is
the algebraic eigenvalue problemMf l2 f.
Recall that the eigenvalue is the corresponding
value of E.
symmetric
Write as square for reasons that will become
apparent later
29So factors solve the algebraic eigenvalue
problemSTS f l2 f.STS is a square
matrix with the same number of rows and columns
as there are components. So there are as many
factors as there are components. The factors
must span a space of the same dimension as the
components.If you sort the eigenvectors by the
size of their eigenvectors, then the ones with
the largest eigenvalue have the largest
components. So selecting the most important
factors is easy.
30An important tidbit from the theory of
eigenvalues and eigenvectors that well use later
on STS f l2 f.Let L2 be a diagonal
matrix of eigenvalues, li2and let V be a matrix
whose columns are the corresponding factors,
f(i)ThenSTS V L2 VT
31Note also that the factors are orthogonalf(i) ?
f(j) 0 if i?jThis is a mathematically
pleasant propertyBut it may not always be the
physically most-relevant choice
contains negative A
close to mean of data
C
C
f2
f1
f1
f2
A
not orthogonal
B
B
orthogonal
A
32Upshoteigenvectors of STS f l2 f with the p
eigenvaluesidentify a p-dimensional
sub-spacein which most of the data lieyou can
use those eigenvectors as factorsOrYou can
chose any other p factors that span that subspace
In the ternary diagram example, they must lie on
the line connecting the two SVD factors
33Singular Value Decomposition (SVD)Any N?M
matrix S and be written as the product of three
matricesS U L VTwhere U is N?N and
satisfies UTU UUTV is M?M and satisfies VTV
VVTandL is an N?M diagonal matrix of singular
values
34Now note that itS U L VT thenSTS U L
VTT U L VT V L UTU L VT V L2VT Compare
with the tidbit mentioned earlier STSVL2VT The
SVD V is the same V we were talking about
earlierThe columns of V are the eigenvectors f,
soF VTSo we can use the SVD to calculate
the factors, F
35But its even better than that! WriteS U L VT
asS U L VT U L VT C FSo the
coefficients are C U Land, as shown
previously, the factors areF VTSo we can
use the SVD to calculate the coefficients, C, and
the factors, F
36MatLab Codefor computing C and F
- U,LAMBDA,V svd(S)
- C ULAMBDA
- F V
37MatLab Codeapproximating S?Sp using only the p
most important factors
- p (whatever)
- UpU(,1p)
- LAMBDApLAMBDA(1p,1p)
- Cp UpLAMBDAp
- Vp V(,1p)
- Fp (Vp)
- Sp Cp Fp
38back to my example
39Each pixel is a component of the imageand the
patters are factorsour derivation assumed that
the data (samples, s(i)) were vectorsHowever,
in this example, the data are images
(matrices)so what I had to do was to write out
the pixels of each image as a vector
40Steps1) load images2) reorganize images into
S3) SVD of S to get U L and V4) Examine L to
identify number of significant factors5) Build
S, using only significant factors6) reorganize
S back into images
41MatLab code for reorganizing a sequence of
imagesD(p,q,r) (p1 Nx) (q1 Nx) (r1 Nt)
into the sample matrix, S(r,s) (r1 Nt) (q1
Nx2)
- for r 1Nt time r
- for p 1Nx row p
- for q 1Nx col q
- s Nx(p-1)q index s
- S(r,s) D(p,q,r)
- end
- end
- end
42MatLab code for reorganizing the sample
matrixS(r,s) (r1 Nt) (s1 Nx2) back into a
sequence of imagesD(p,q,r) (p1 Nx) (q1 Nx)
(r1 Nt)
- for r 1Nt
time p - for s 1NxNx
index s - p floor( (s-1)/Nx0.01 ) 1 row p
- q s - Nx(p-1)
col q - D(p,q,r) S(r,s)
- end
- end
43Reality of Factorsare factors intrinsically
meaningful, or just a convenient way of
representing data?
- Example
- Suppose the samples are rocks
- and the components are element concentrations
- then
- thinking of the factors as minerals might make
intuitive sense - Minerals fixed element composition
- Rock mixture of minerals
44Many rocks but just a few minerals
rock 3
rock 1
rock 2
rock 6
rock 7
rock 5
mineral (factor) 1
rock 4
mineral (factor) 2
mineral (factor) 3
45Possibly Desirable Properties of Factors
- Factors are unlike each other
- different minerals typically contain different
elements - Factor contains either large or near-zero
components - a mineral typically contains only a few
elements - Factors have only positive components
- minerals composed of positive amount of
chemical elements - Coefficient of factors are positive
- rocks composed of positive amount of minerals
- Coefficient typically either large or near-zero
- rocks composed of just a few major minerals
46Transformations of Factors
- S C F
- Suppose we mix factors together to get new
factors set of factors
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)
(f1 in f1) (f2 in f1) (f3 in f1) (f1 in
fs2) (f2 in f2) (f3 in f2) (f1 in f3)
(f2 in f3) (f3 in f3)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in
f3) (C in f3)
New FactorsM?M
Transformation M?M
Old Factors M?M
Fnew T Fold
47Transformations of Factors
- Fnew T Fold
- A requirement is that T-1 exists, else Fnew will
not span the same space as Fold - S C F C I F (C T-1) (T F) Cnew Fnew
- So you could try to implement the desirable
factors by designing an appropriate
transformation matrix, T - A somewhat restrictive choice of T is TR, where
R is a rotation matrix - (rotation matrices satisfy R-1RT)
48A method for implementing this property
- Factors are unlike each other
- different minerals typically contain different
elements - Factor contains either large or near-zero
components - a mineral typically contains only a few
elements - Factors have only positive components
- minerals composed of positive amount of
chemical elements - Coefficient of factors are positive
- rocks composed of positive amount of minerals
- Coefficient typically either large or near-zero
- rocks composed of just a few major minerals
49- Factor contains either large or near-zero
components - More-or-less equivalent to
- Lots of variance in the amounts of components
contained in the factor
50Usual formula for variance for data, x
- sd2 N-2 N Sixi2 - (Si xi)2
Application to factor, f
sf2 N-2 N Sifi4 - (Si fi2)2
Note that we are measuring the variance of the
squares of the elements of , f. Thus a factor
has large sf2 if the absolute-value of its
elements has a lot of variation. The sign of the
elements is irrelevant.
51Varimax Factors
Procedure for maximizing the variance of the
factorswhile still preserving their
orthogonality
52Based on rotating pairs of factorsin their plane
f1new
f2old
f1old
q
f2new
53f1
rotating a pair of factors in their plane by an
amount q
f1
f2
cos(q)f2 sin(q)f3
R
f3
-sin(q)f2 cos(q)f3
f4
f4
1 0 0 0 0
cos(q) sin(q) 0 0 -sin(q) cos(q)
0 0 0 0 1
R
Called a Givens rotation, by the way
54Varimax Procedure
for a pair of factors fs and ftfind q that
maximizes the sum of their variances
E N2(sfs2sft2) NSifis4-(Si
fis2)2NSifit4-(Si fit2)2
where fis cos(q) fis sin(q) fitwhere fit
-sin(q) fis cos(q) fit
Just solve dE/dq 0
55After much algebra
2NSi uivi Siui Sivi
q ¼ tan-1
NSi (ui2-vi2) (Siui)2 (Sivi)2
where ui fis2 - fit2 and vi 2 fis2 fit2
56Then just apply this rotation to every pair of
factorsthe result is a new set of factor that
are mutually orthogonalbut that have maximal
variancehence the name Varimax
Actually, you need to do the whole procedure
multiple times to get convergence, since
subsequent rotations to some extent undo the work
of previous rotations
57Example 1
- fs ½, ½, ½, ½ T and ft ½, -½, -½, -½
T - q 45
- fs 1/?2, 0, 1/?2, 0 T and ft 0,
-1/?2, 0, - 1/?2 T
worst case zero variance
sfs2 sft2
sum of variances
q45
rotation angle, q
58Example 2
- fs 0.63, 0.31, 0.63, 0.31T ft 0.31,
- 0.63, 0.31, -0.63T - q 26.56
- fs 0.71, 0.00, 0.71, 0.00T ft 0.00,
-0.71, 0.00, -0.71T
sfs2 sft2
sum of variances
q26.56
rotation angle, q