Title: Review of Linear Algebra
1Lecture 3
- Review of Linear Algebra
- Simple least-squares
29 things you need to remember from Linear Algebra
3Number 1 rule for vector and matrix
multiplicationu Mv ui Sk1N Mik vk P
QR Pij Sk1N Qik Rkj
Name of index in sum irrelevant. You can call it
anything (as long as youre consistent)
Sum over nearest neighbor indices
4Number 2transpostionrows become columns and
columns become rows(AT)ij Aji and rule for
transposition of products(AB)T BT AT
Note reversal of order
5Number 3 rule for dot producta?b aT b
Si1N ai binotea?a is sum of squared elements
of athe length of a
6Number 4 the inverse of a matrixA-1 A IA
A-1 I(exists only when A is square)
I is the identity matrix 1 0 0 0 1
0 0 0 1
7Number 5 solving yMx using the inversex
M-1y
8Number 6multiplication by identity matrix M
IM MIin component notation Iij dij Sk1N
dik Mkj Mij Sk1N dik Mkj Mij
Just a name
Cross out sum Cross out dik And change k to i in
rest of equation
9Number 7 inverse of a 2?2 matrix
d -b -c a
a b c d
1
A
A-1
ad-bc
10Number 8 inverse of a diagonal matrix
a 0 0 0 0 b 0 0 0 0 c 0 ... 0 0 0 z
1/a 0 0 0 0 1/b 0 0 0 0 1/c 0 ... 0
0 0 1/z
A-1
A
11Number 9 rule for taking a derivativeuse
component-notationtreat every element as a
independent variableremember that since
elements are independentdxi / dxj dij
identity matrix
12Example Suppose y Ax How does yi vary as we
change xj? (Thats the meaning of the derivative
dyi/dxj) first write i-th component of y, yi
Sk1N Aik xk (d/dxj) yi (d/dxj) Sk1N Aik
xk Sk1N Aik dxk/dxj Sk1N Aik dkj Aij
Were using I and j, so use a different letter,
say k, in the summation!
So the derivative dyi/dxj is just Aij. This is
analogous to the case for scalars, where the
derivative dy/dx of the scalar expression yax is
just dy/dxa.
13best fitting linethe combination ofapre and
bprethat have the smallestsum-of-squared-errors
find it by exhaustive searchgrid search
14Fitting line to noisy datayobs a bx
Observations the vector, yobs
15Guess values for a, bypre aguess bguessx
Prediction error observed minus predicted e
yobs - ypre Total error sum of squared
predictions errors E S ei2 eT e
aguess2.0 bguess2.4
16Systematically examine combinations of (a, b) on
a 101?101 grid
Minimum total error E is here Note E is not zero
apre
Error Surface
bpre
17 Note Emin is not zero
Here are best-fitting a, b
best-fitting line
Error Surface
18Note some range of values where the error is
about the same as the minimun value, Emin
Emin is here
Error pretty close to Emin everywhere in here
All as in this range and bs in this range have
pretty much the same error
Error Surface
19moralthe shape of the error surfacecontrols
the accuracy by which (a,b) can be estimated
20What controls the shape of theerror
surface?Lets examine effect of increasing the
error in the data
21The minimum error increases, but the shame of the
error surface is pretty much the same
Error in data 0.5
Emin 0.20
Error in data 5.0
Emin 23.5
22What controls the shape of theerror
surface?Lets examine effect of shifting the
x-position of the data
23Big change by simply shifting x-values of the
data Region of low error is now tilted High b
low a has low error Low b high a has low
error But (high b, high a) and (low a, low b)
have high error
0
10
5
24Meaning of tilted region of low errorerror in
(apre, bpre) arecorrelated
25Uncorrelated estimates of intercept and slope
Best-fit line
Best-fit line
Best fit intercept
erroneous intercept
When the data straddle the origin, if you tweak
the intercept up, you cant compensate by
changing the slope
26Negatively correlation of intercept and slope
Same slope s Best-fit line
Best-fit line
Low slope line
erroneous intercept
Best fit intercept
When the data are all to the right of the origin,
if you tweak the intercept up, you must lower the
slope to compensate
27Positive correlation of intercept and slope
erroneous intercept
Best fit intercept
Best fit intercept
Same slope as best-fit line
Best-fit line
When the data are all to the right of the origin,
if you tweak the intercept up, you must raise the
slope to compensate
28data near originpossibly good control on
interceptbut lousy control on slope
small
big
-5 0 5
29data far from originlousy control on
interceptbut possibly good control on slope
big
0 50 100
small
30Set up for standard Least Squares
- yi a b xi
- y1 1 x1 a
- y2 1 x2 b
-
- yN 1 xN
31Standard Least-squares Solution
32Derivation use fact that minimum is at dE/dmi
0E Sk ek ek Sk (dk- SpGkpmp) (dk- SqGkqmq)
Sk dkdk - 2 Sk dk SpGkpmp SkSpGkpmpSqGkqmq
dE/dmi 0 - 2 Sk dk SpGkp(dmp/dmi)
SkSpGkp(dmp/dmi)SqGkqmq SkSpGkpmpSqGkq(dmq/dmi)
-2 Sk dk SpGkpdpi SkSpGkpdpiSqGkqmq
SkSpGkpmpSqGkqdqi -2 Sk dk Gki SkGkiSqGkqmq
SkSpGkpmpGki -2 Sk Gki dk 2 Sq
SkGkiGkqmq 0or 2GTd 2GTGm 0 or
mGTG-1GTdy
33Why least-squares?Why not least-absolute
length?Or something else?
34Least-Squares Least Absolute Value
a1.00 b2.02
a0.94 b 2.02