Linear Regression - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Linear Regression

Description:

Julian Faraway's 'Practical Regression and ANOVA in R' book. faraway_pra_book.pdf ... Optimal parameter values (note that ATA must be invertible): x-hat is BLUE ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 24

Provided by: AndyJa3

Category:

more less

Transcript and Presenter's Notes

Title: Linear Regression

1
Linear Regression

Statistical Anecdotes
Do hospitals make you sick?
Students story
Etymology of regression

Andy Jacobson July 2006
2
Linear Regression

Statistical Anecdotes
Do hospitals make you sick?
Students story
Etymology of regression

Andy Jacobson July 2006
3
Outline

Discussion of yesterdays exercise
The mathematics of regression
Solution of the normal equations
Probability and likelihood
Sample exercise Mauna Loa CO2
Sample exercise TransCom3 inversion

4
http//www.aos.princeton.edu/WWWPUBLIC/sara/stati
stics_course/andy/R/
corr_exer.r 18 July practical
mauna_loa.r Todays first example
transcom3.r Todays second example
dot-Rprofile Rename to /.Rprofile (i.e., home dir)
hclimate.indices.r Get SOI, NAO, PDO, etc. from CDC
cov2cor.r Convert covariance to correlation
ferret.palette.r Use nice ferret color palettes
geo.axes.r Format degree symbols, etc., for maps
load.ncdf.r Quickly load a whole netCDF file
svd.invert.r Multiple linear regression using SVD
mat4.r Read and write Matlab .mat files (v4 only)
svd_invert.m Multiple linear regression using SVD (Matlab)
atm0_m1.mat Data for the TransCom3 example
R-intro.pdf Basic R documentation
faraway_pra_book.pdf Julian Faraways Practical Regression and ANOVA in R book
5
Multiple Linear Regression
Data
Basis Set
Parameters
6
Basis Functions
Design matrix A gives values of each basis
function at each observation location.
Basis Functions
Observations
Note that one column of (e.g., ai1) may be all
ones, to represent the intercept.
7
From the Cost Function to the Normal Equations
Least squares optimization minimizes sum of
squared residuals (misfits to data). For the
time being, we assume that the residuals are IID
Expanding terms
Cost is minimized when derivative w.r.t. x
vanishes
Rearranging
Optimal parameter values (note that ATA must be
invertible)
8
x-hat is BLUE
BLUE Best Linear Unbiased Estimate (not shown
here best)
9
Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
For every M x N matrix A, there exists a singular
value decomposition (SVD)
U is M x M S is N x N V is N x N
S is diagonal and contains the Singular
Values The columns of U and V are orthogonal to
one another
The pseudo-inverse is thus
10
Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
The pseudo-inverse is
where
11
Practical Solution of Normal Equations using SVD
If we could pre-multiply the forward equation by
A-1, the pseudo-inverse of A, we could get our
answer directly
The pseudo-inverse is
And the parameter uncertainty covariance matrix
is
with
12
Gaussian Probability and Least Squares
Residuals vector
Predictions
Observations
Probability of ri
Likelihood of r
N.B. Only true if residuals are uncorrelated
(independent).
13
Maximum Likelihood
Log-Likelihood of r
Goodness-of-fit ?2 for N-M degrees of freedom
has a known distribution, so regression models
such as this can be judged on the probability of
getting a given value of ?2.
14
Probability and Least Squares

Why should we expect Gaussian residuals?

15
Random Processes
z1 lt- runif(5000)
16
Random Processes
hist(z1)
17
Random Processes
z1 lt- runif(5000)
z2 lt- runif(5000)
What is the distribution of (z1 z2) ?
18
Triangular Distribution
hist(z1z2)
19
Central Limit Theorem
There are more ways to get a central value than
an extreme one.
20
Probability and Least Squares

Why should we expect Gaussian residuals?
(1) Because the Central Limit Theorem is on
our side.
(2) Note that the LS solution is always a minimum
variance solution, which is useful by itself.
The maximum-likelihood interpretation is more
of a goal than a reality.

21
Weighted Least SquaresMore General Data Errors
Minimizing the ?2 is equivalent to minimizing a
cost function containing a covariance matrix C of
data errors
The data error covariance matrix is often taken
to be diagonal. This means that you put
different levels of confidence on different
observations (confidence assigned by assessing
both measurement error and amount of trust in
your basis functions and linear model). Note
that this structure still assumes independence
between the residuals.
22
Covariate Data Errors
Recall cost function
Now allow off-diagonal covariances in C.
N.B. ?ij ?ji and ?ii ?i2.
Multivariate normal PDF J propagates without
trouble into the likelihood expression.
Minimizing J still maximizes the likelihood
23
Fundamental Trick forWeighted and Generalized
Least Squares
Transform system (A,b,C) with data covariance
matrix C into system (A,b,C), where C is the
identity matrix
The Cholesky decomposition computes a matrix
square root such that if Rchol(C), then CRR.
You can then solve the Ordinary Least Squares
problem Ax b, using for instance the SVD
method. Note that x remains in regular,
untransformed space.

Write a Comment

User Comments (0)