Ordinary least squares regression OLS - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Ordinary least squares regression OLS

Description:

Any non-singular matrix can be decomposed into a product ... The matrix X'X is always symmetric and it is positive definite provided that rank(X) = p ... – PowerPoint PPT presentation

Number of Views:897

Avg rating:3.0/5.0

Slides: 21

Provided by: ANG109

Category:

more less

Transcript and Presenter's Notes

Title: Ordinary least squares regression OLS

1
Ordinary least squares regression (OLS)

Minimize
Solve the equation system
that can be written
where
is a matrix of observed x-variables

2
Ridge regression

Minimize
where ? is a given shrinkage factor
Solve the equation system
that can be written
where
is a matrix of standardized x-variables

3
Smoothing a time series of observations

Minimize
where ? is a given smoothing factor
Solve the equation system
i.e. solve an equation
system with n unknowns and n equations

4
Smoothing a time series of observations

Solve the equation system

5
Smoothing a time series of observations

Solve the equation system
where A is symmetric and

6
Gauss elimination

Consider the equation system
where A is a square nonsingular matrix
Set
Form
and continue to eliminate variables one by one

7
LU-decomposition

Any non-singular matrix can be decomposed into a
product
of an upper triangular matrix U and a lower
triangular matrix L with ones on the diagonal
We can then solve the equation system
by first solving
and then

8
LU-decomposition

The number of additions/multiplications needed
for an LU-decomposition is approximately p3
The numerical stability of LU-decomposition can
be increased by pivoting the rows of the
coefficient matrix

9
Choleski decomposition

Any positive definite symmetric matrix A can be
uniquely decomposed into a product
where U is an upper triangular matrix with
positive diagonal elements
We can then solve the equation system
by first solving
and then

10
Fitting a single regression model to data

The matrix XX is always symmetric and it is
positive definite provided that rank(X) p
Use Cholesky decomposition for fitting a single
regression model to data

11
Stepwise regression and the sweep operator

Consider the augmented matrix
Sequentially apply the sweep operator to this
matrix.
This yields the least squares estimates and
residual sum of squares corresponding to models
with just the first k covariates included, k 1,
, p.
It is easy to update the fit for adding or
deleting a covariate.

12
Fitting a ridge regression model to data

The introduction of a shrinkage factor ? has two
effects
The numerical stability of the equation system
is higher than that of the OLS system
The variance of the obtained predictor is reduced

13
Smoothing a time series of observations

The equation system set up to minimize
has a coefficient matrix that is a symmetric,
positive definite band matrix
The upper triangular matrix in the Cholesky
decomposition is also a band matrix
The number of operations needed is O(n)
The smoothing conditions can be tailored to the
application

14
Smoothing of time series of data collected over
several seasons
Sequential smoothing over seasons Smoothing over
years
15
Smoothing of time series of data representing
several sectors
Circular smoothing over sectors Temporal
smoothing over years
16
Regression using QR-decomposition of the X matrix

Assume that the X-matrix has full rank p
For any n x n orthogonal matrix Q
Find a Q so that
where R is an upper triangular matrix
The least squares solution is then given by
where Q1 contains the first p columns of Q

17
Calculation of sample variances

Algorithm 1
Formula 2
Are the two algorithms numerically equivalent?

18
Least squares regression with constants

Reparameterize the regression model to
This often gives a much better conditioned
problem
If the first column of the X-matrix is constant,
QR-factorization automatically transforms the
problem in this manner.

19
Singular value decomposition

The singular value decomposition (SVD) of an nxp
matrix X with n ? p is of the form
where U is an orthonormal nxp matrix (UU Ip),
D is a diagonal matrix with elements d1 ? d2 ?
dp ? 0, and V is a pxp orthonormal matrix (VV
Ip)
Normally, SVD provides stable solutions of linear
regression problems
In addition, the columns of UD and the singular
values d1,d2 ,, dp ? 0 have interesting
statistical interpretations

20
Statistical interpretation of SVD components

The principal components of a set of data in Rp
provide a sequence of best linear approximations
to that data, of all ranks q ? p
The directions of the extracted vectors are given
by v1, , vp
The coordinates of the data points in the new
coordinate system are given by the columns of UD
In addition,
The linear combination Xv1 has the highest
variance of all linear combinations of the
features for which v1 has length 1.
The linear combination Xv2 has the highest
variance of all linear combinations of the
features for which v2 has length 1 and is
orthogonal to v1.