Title: Linear Regression
1Linear Regression
- Statistical Anecdotes
- Do hospitals make you sick?
- Students story
- Etymology of regression
Andy Jacobson July 2006
2Linear Regression
- Statistical Anecdotes
- Do hospitals make you sick?
- Students story
- Etymology of regression
Andy Jacobson July 2006
3Outline
- Discussion of yesterdays exercise
- The mathematics of regression
- Solution of the normal equations
- Probability and likelihood
- Sample exercise Mauna Loa CO2
- Sample exercise TransCom3 inversion
4http//www.aos.princeton.edu/WWWPUBLIC/sara/stati
stics_course/andy/R/
corr_exer.r 18 July practical
mauna_loa.r Todays first example
transcom3.r Todays second example
dot-Rprofile Rename to /.Rprofile (i.e., home dir)
hclimate.indices.r Get SOI, NAO, PDO, etc. from CDC
cov2cor.r Convert covariance to correlation
ferret.palette.r Use nice ferret color palettes
geo.axes.r Format degree symbols, etc., for maps
load.ncdf.r Quickly load a whole netCDF file
svd.invert.r Multiple linear regression using SVD
mat4.r Read and write Matlab .mat files (v4 only)
svd_invert.m Multiple linear regression using SVD (Matlab)
atm0_m1.mat Data for the TransCom3 example
R-intro.pdf Basic R documentation
faraway_pra_book.pdf Julian Faraways Practical Regression and ANOVA in R book
5Multiple Linear Regression
Data
Basis Set
Parameters
6Basis Functions
Design matrix A gives values of each basis
function at each observation location.
Basis Functions
Observations
Note that one column of (e.g., ai1) may be all
ones, to represent the intercept.
7From the Cost Function to the Normal Equations
Least squares optimization minimizes sum of
squared residuals (misfits to data). For the
time being, we assume that the residuals are IID
Expanding terms
Cost is minimized when derivative w.r.t. x
vanishes
Rearranging
Optimal parameter values (note that ATA must be
invertible)
8x-hat is BLUE
BLUE Best Linear Unbiased Estimate (not shown
here best)
9Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
For every M x N matrix A, there exists a singular
value decomposition (SVD)
U is M x M S is N x N V is N x N
S is diagonal and contains the Singular
Values The columns of U and V are orthogonal to
one another
The pseudo-inverse is thus
10Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
The pseudo-inverse is
where
11Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
The pseudo-inverse is
And the parameter uncertainty covariance matrix
is
with
12Gaussian Probability and Least Squares
Residuals vector
Predictions
Observations
Probability of ri
Likelihood of r
N.B. Only true if residuals are uncorrelated
(independent).
13Maximum Likelihood
Log-Likelihood of r
Goodness-of-fit ?2 for N-M degrees of freedom
has a known distribution, so regression models
such as this can be judged on the probability of
getting a given value of ?2.
14Probability and Least Squares
- Why should we expect Gaussian residuals?
-
15Random Processes
z1 lt- runif(5000)
16Random Processes
hist(z1)
17Random Processes
z1 lt- runif(5000)
z2 lt- runif(5000)
What is the distribution of (z1 z2) ?
18Triangular Distribution
hist(z1z2)
19Central Limit Theorem
There are more ways to get a central value than
an extreme one.
20Probability and Least Squares
- Why should we expect Gaussian residuals?
- (1) Because the Central Limit Theorem is on
our side. - (2) Note that the LS solution is always a minimum
variance solution, which is useful by itself.
The maximum-likelihood interpretation is more
of a goal than a reality.
21Weighted Least SquaresMore General Data Errors
Minimizing the ?2 is equivalent to minimizing a
cost function containing a covariance matrix C of
data errors
The data error covariance matrix is often taken
to be diagonal. This means that you put
different levels of confidence on different
observations (confidence assigned by assessing
both measurement error and amount of trust in
your basis functions and linear model). Note
that this structure still assumes independence
between the residuals.
22Covariate Data Errors
Recall cost function
Now allow off-diagonal covariances in C.
N.B. ?ij ?ji and ?ii ?i2.
Multivariate normal PDF J propagates without
trouble into the likelihood expression.
Minimizing J still maximizes the likelihood
23Fundamental Trick forWeighted and Generalized
Least Squares
Transform system (A,b,C) with data covariance
matrix C into system (A,b,C), where C is the
identity matrix
The Cholesky decomposition computes a matrix
square root such that if Rchol(C), then CRR.
You can then solve the Ordinary Least Squares
problem Ax b, using for instance the SVD
method. Note that x remains in regular,
untransformed space.