Lecture 20 Empirical Orthogonal Functions and Factor Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 20 Empirical Orthogonal Functions and Factor Analysis

Description:

Note also that the factors are orthogonal. f(i) f(j) = 0 if i ... while still preserving their orthogonality. Based on rotating pairs of factors. in their plane ... – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 59

Provided by: billm7

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 20 Empirical Orthogonal Functions and Factor Analysis

1
Lecture 20Empirical Orthogonal
FunctionsandFactor Analysis
2
Motivationin Fourier Analysis the choice of
sine and cosine patterns was prescribed by the
method.Could we use the data itself as a source
of information about the shape of the patterns?
3
Examplemaps of some hypothetical function,say,
sea surface temperatureforming a sequence in
time
4
the data
time
time
5
the data
6
pattern number
pattern importance
7
pattern number
Choose just the most important patterns
pattern importance
3
8
3 most important patterns
9
comparison
original
reconstruction using only 3 patterns
Note that this process has reduced the
noise(since noise has no pattern common to all
the images)
10
amplitudes of patterns
time
11
amplitudes of patterns
time
Note no requirement that pattern is periodic in
time
12
Discussionmixing of end members
13
ternary diagram
Useful tool for data that has three components
C
A
B
14
works for 3 end-members, as long as ABC100
C
0 A
25 A
50 A
75 A
100 A
B
similarly for B and C
15
Suppose data fall near line on diagram
C
data
A
B
16
Suppose data fall near line on diagram
C
end-members or factors
f1
f2
A
B
17
Suppose data fall near line on diagram
C
end-members or factors
f1
f2
A
B
18
Suppose data fall near line on diagram
C
end-members or factors
f1
mixing line
f2
A
B
19
data idealize as being on mixing line
20
You could represent the data exactly with a third
noise factor
C
doesnt much matter where you put f3, as long as
its not on the line
f1
f2
f3
A
B
21
S components (A, B, C, ) in each sample, s

(A in s1) (B in s1) (C in s1)
(A in s2) (B in s2) (C in s2)
(A in s3) (B in s3) (C in s3)
(A in sN) (B in sN) (C in sN)

S
N samplesM componentsS is N?M
Note a sample is along a row in S
22
F components (A, B, C, ) in each factor, f

(A in f1) (B in f1) (C in f1)
(A in f2) (B in f2) (C in f2)
(A in f3) (B in f3) (C in f3)

F
M componentsM factorsF is M?M
23
C coefficients of the factors

(f1 in s1) (f2 in s1) (f3 in s1)
(f1 in s2) (f2 in s2) (f3 in s2)
(f1 in s3) (f2 in s3) (f3 in s3)
(f1 in sN) (f2 in sN) (f3 in sN)

C
N samplesM factorsC is N?M
24
SamplesN?M
S C F
(f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2)
(f2 in s2) (f3 in s2) (f1 in s3) (f2 in
s3) (f3 in s3) (f1 in sN) (f2 in sN)
(f3 in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)

Factors M?M
Coefficients N?M
25
SamplesN?M
data approximated with only most important
factorsp most important factors those with
the biggest coefficients
S ? C F
(f1 in s1) (f2 in s1) (f1 in s2) (f2 in
s2) (f1 in s3) (f2 in s3) (f1 in sN) (f2
in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2)

ignore f3
ignore f3
selectedcoefficients N?p
selectedfactors p?M
26
view samples as vectors in space
B
s1
s2
s3
Let the factors be unit vectors
f
C
A
then the coefficients are the projections (dot
products) of the sample onto the factors
27
Suggests a method of choosing factors so that
they have large coefficients
Find the factor f that maximizesE Si si ? f
2with the constraint that f ? f 1Note
square the dot product since it can be negative
28
Find the factor f that maximizesE Si si ? f
2 with the constraint that L f ? f 1
0E Si si ? f 2 Si Sj Sij fj Sk Sik
fk Sj Sk Si Sij Sik fj fk Sj Sk Mjk fj
fk with Mjk Si Sij Sik or MSTSL Si fi2
1Use Lagrange Multipliers, minimizing FE-l2L,
where l2 is the Lagrange Multiplier. We solved
this problem 2 lectures ago. Its solution is
the algebraic eigenvalue problemMf l2 f.
Recall that the eigenvalue is the corresponding
value of E.
symmetric
Write as square for reasons that will become
apparent later
29
So factors solve the algebraic eigenvalue
problemSTS f l2 f.STS is a square
matrix with the same number of rows and columns
as there are components. So there are as many
factors as there are components. The factors
must span a space of the same dimension as the
components.If you sort the eigenvectors by the
size of their eigenvectors, then the ones with
the largest eigenvalue have the largest
components. So selecting the most important
factors is easy.
30
An important tidbit from the theory of
eigenvalues and eigenvectors that well use later
on STS f l2 f.Let L2 be a diagonal
matrix of eigenvalues, li2and let V be a matrix
whose columns are the corresponding factors,
f(i)ThenSTS V L2 VT
31
Note also that the factors are orthogonalf(i) ?
f(j) 0 if i?jThis is a mathematically
pleasant propertyBut it may not always be the
physically most-relevant choice
contains negative A
close to mean of data
C
C
f2
f1
f1
f2
A
not orthogonal
B
B
orthogonal
A
32
Upshoteigenvectors of STS f l2 f with the p
eigenvaluesidentify a p-dimensional
sub-spacein which most of the data lieyou can
use those eigenvectors as factorsOrYou can
chose any other p factors that span that subspace
In the ternary diagram example, they must lie on
the line connecting the two SVD factors
33
Singular Value Decomposition (SVD)Any N?M
matrix S and be written as the product of three
matricesS U L VTwhere U is N?N and
satisfies UTU UUTV is M?M and satisfies VTV
VVTandL is an N?M diagonal matrix of singular
values
34
Now note that itS U L VT thenSTS U L
VTT U L VT V L UTU L VT V L2VT Compare
with the tidbit mentioned earlier STSVL2VT The
SVD V is the same V we were talking about
earlierThe columns of V are the eigenvectors f,
soF VTSo we can use the SVD to calculate
the factors, F
35
But its even better than that! WriteS U L VT
asS U L VT U L VT C FSo the
coefficients are C U Land, as shown
previously, the factors areF VTSo we can
use the SVD to calculate the coefficients, C, and
the factors, F
36
MatLab Codefor computing C and F

U,LAMBDA,V svd(S)
C ULAMBDA
F V

37
MatLab Codeapproximating S?Sp using only the p
most important factors

p (whatever)
UpU(,1p)
LAMBDApLAMBDA(1p,1p)
Cp UpLAMBDAp
Vp V(,1p)
Fp (Vp)
Sp Cp Fp

38
back to my example
39
Each pixel is a component of the imageand the
patters are factorsour derivation assumed that
the data (samples, s(i)) were vectorsHowever,
in this example, the data are images
(matrices)so what I had to do was to write out
the pixels of each image as a vector
40
Steps1) load images2) reorganize images into
S3) SVD of S to get U L and V4) Examine L to
identify number of significant factors5) Build
S, using only significant factors6) reorganize
S back into images
41
MatLab code for reorganizing a sequence of
imagesD(p,q,r) (p1 Nx) (q1 Nx) (r1 Nt)
into the sample matrix, S(r,s) (r1 Nt) (q1
Nx2)

for r 1Nt time r
for p 1Nx row p
for q 1Nx col q
s Nx(p-1)q index s
S(r,s) D(p,q,r)
end
end
end

42
MatLab code for reorganizing the sample
matrixS(r,s) (r1 Nt) (s1 Nx2) back into a
sequence of imagesD(p,q,r) (p1 Nx) (q1 Nx)
(r1 Nt)

for r 1Nt
time p
for s 1NxNx
index s
p floor( (s-1)/Nx0.01 ) 1 row p
q s - Nx(p-1)
col q
D(p,q,r) S(r,s)
end
end

43
Reality of Factorsare factors intrinsically
meaningful, or just a convenient way of
representing data?

Example
Suppose the samples are rocks
and the components are element concentrations
then
thinking of the factors as minerals might make
intuitive sense
Minerals fixed element composition
Rock mixture of minerals

44
Many rocks but just a few minerals
rock 3
rock 1
rock 2
rock 6
rock 7
rock 5
mineral (factor) 1
rock 4
mineral (factor) 2
mineral (factor) 3
45
Possibly Desirable Properties of Factors

Factors are unlike each other
different minerals typically contain different
elements
Factor contains either large or near-zero
components
a mineral typically contains only a few
elements
Factors have only positive components
minerals composed of positive amount of
chemical elements
Coefficient of factors are positive
rocks composed of positive amount of minerals
Coefficient typically either large or near-zero
rocks composed of just a few major minerals

46
Transformations of Factors

S C F
Suppose we mix factors together to get new
factors set of factors

(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)
(f1 in f1) (f2 in f1) (f3 in f1) (f1 in
fs2) (f2 in f2) (f3 in f2) (f1 in f3)
(f2 in f3) (f3 in f3)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in
f3) (C in f3)

New FactorsM?M
Transformation M?M
Old Factors M?M
Fnew T Fold
47
Transformations of Factors

Fnew T Fold
A requirement is that T-1 exists, else Fnew will
not span the same space as Fold
S C F C I F (C T-1) (T F) Cnew Fnew
So you could try to implement the desirable
factors by designing an appropriate
transformation matrix, T
A somewhat restrictive choice of T is TR, where
R is a rotation matrix
(rotation matrices satisfy R-1RT)

48
A method for implementing this property

Factors are unlike each other
different minerals typically contain different
elements
Factor contains either large or near-zero
components
a mineral typically contains only a few
elements
Factors have only positive components
minerals composed of positive amount of
chemical elements
Coefficient of factors are positive
rocks composed of positive amount of minerals
Coefficient typically either large or near-zero
rocks composed of just a few major minerals

Factor contains either large or near-zero
components
More-or-less equivalent to
Lots of variance in the amounts of components
contained in the factor

50
Usual formula for variance for data, x

sd2 N-2 N Sixi2 - (Si xi)2

Application to factor, f
sf2 N-2 N Sifi4 - (Si fi2)2
Note that we are measuring the variance of the
squares of the elements of , f. Thus a factor
has large sf2 if the absolute-value of its
elements has a lot of variation. The sign of the
elements is irrelevant.
51
Varimax Factors
Procedure for maximizing the variance of the
factorswhile still preserving their
orthogonality
52
Based on rotating pairs of factorsin their plane
f1new
f2old
f1old
q
f2new
53
f1
rotating a pair of factors in their plane by an
amount q
f1
f2
cos(q)f2 sin(q)f3
R
f3
-sin(q)f2 cos(q)f3
f4
f4
1 0 0 0 0
cos(q) sin(q) 0 0 -sin(q) cos(q)
0 0 0 0 1
R
Called a Givens rotation, by the way
54
Varimax Procedure
for a pair of factors fs and ftfind q that
maximizes the sum of their variances
E N2(sfs2sft2) NSifis4-(Si
fis2)2NSifit4-(Si fit2)2
where fis cos(q) fis sin(q) fitwhere fit
-sin(q) fis cos(q) fit
Just solve dE/dq 0
55
After much algebra
2NSi uivi Siui Sivi
q ¼ tan-1
NSi (ui2-vi2) (Siui)2 (Sivi)2
where ui fis2 - fit2 and vi 2 fis2 fit2
56
Then just apply this rotation to every pair of
factorsthe result is a new set of factor that
are mutually orthogonalbut that have maximal
variancehence the name Varimax
Actually, you need to do the whole procedure
multiple times to get convergence, since
subsequent rotations to some extent undo the work
of previous rotations
57
Example 1