Title: OLS
1Chapter 5 Ordinary Least Square Regression
- We will be discussing
- The Linear Regression Model
- Estimation of the Unknowns in the Regression
Model - Some Special Cases of the Model
2The Basic Regression Model
The model expressed in scalars
The model elaborated in matrix terms
A succinct matrix expression for it
y X? e
3Prediction Based on a Linear Combination
The E(y) is given by
Together with the previous slide, we can say that
Key Question How Do We Get Values for the ?
vector?
4Parameter Estimation of ?
- We will cover two philosophies of parameter
estimation, the least squares principle - and maximum likelihood. Each of these has the
following steps - Pick an objective function to optimize
- Find the derivative of f with respect to the
unknown parameters - Find values of the unknowns where that
derivative is zero
The two differ in the first step. The least
squares principle would have us pick f e'e
as our objective function which we will
minimize.
5We Wish to Minimize e'e
We want to minimize f e'e over all possible
values of elements in the vector ?
The function f depends on ?
6Minimizing e?e Cont'd
These two terms are the same so
f y?y 2y?X? ??X?X?.
7What Is the Derivative?
Our objective function is the sum f y?y
2y?X? ??X?X?
Now we need to determine the derivative,
and set it to a column of k zeroes.
The derivative of a sum is equal to the sum of
the derivatives, so we can handle it in pieces.
8A Quickie Review of Some Derivative Rules
The derivative of a constant
The derivative of a linear combination
The derivative of a transpose
The derivative of a quadratic form
9The Derivative of the Sum Is the Sum of the
Derivatives
f y?y 2y?X? ??X?X?
(The derivative of a constant)
(The derivative of a quadratic form)
(The derivative of a linear combination and the
derivative of a transpose)
10Beta Gets a Hat
Add these all together and set equal to zero
And with some algebra
(This one has a name)
(This one has a hat)
11Is Our Formula Any Good?
12What Do We Really Mean by Good?
Unbiasedness
as n ? ?.
Consistency
Sufficiency
does not depend on ?
Efficiency
is smaller than other estimators
13Two Key Assumptions
The behavior of the estimator is driven by the
error input
According to the Gauss-Markov Assumption, V(e)
? looks like
14The Likelihood Principle
Consider a sample of 3 10, 11 and 12. What is
??
5 10 11 12
10 11 12
15Maximum Likelihood
- According to ML, we should pick values for the
parameters that maximize the probability of the
sample. - To do this we need to follow these steps
- Derive the probability of an observation
- Assuming independent observations, calculate the
likelihood of the sample using multiplication - Take the log of the sample likelihood
- Derive the derivative of the log likelihood with
respect to the parameters - Figure out what the parameters must be so that
the derivative is equal to a vector of zeroes - With linear models we can do this analytically
using algebra - With non-linear models we sometimes have to use
brute force hill climbing routines
16The Probability of Observation yi
17Multiply Out the Probability of the Whole Sample
18Take the Log of the Sample Likelihood
ln exp(a) a
ln 1 0
ln ab b ln a
19Figure Out the Derivative and Set Equal to Zero
From here we are just a couple of easy algebraic
steps away from the normal equations, and the
least squares formula,
If ML estimators exist for a model, that
estimator is guaranteed to be consistent,
asymptotically normally distributed and
asymptotically efficient.
20Sums of Squares
SSError y?y - y?X(X?X)-1X?y SSError
SSTotal - SSPredictable
21Using the Covariance Matrix Instead of the Raw
SSCP Matrix
The covariance matrix of the y and x variables
looks like
22Using Z Scores
We can calculate a standard version of using
Z scores
Or use the correlation matrix of all the variables
23The Concept of Partialing
Imagine that we divided up the x variables into
two sets
so that the Beta's were also divided the same way
The model becomes
24The Normal Equations
The normal equations would then be
or
from the first equation gives us
Subtract
25The Estimator for the First Set
Solving for the Estimator for the first set yields
The usual formula
Factoring
What is this?
26The P and M Matrices
Define  P X(X?X)-1X?  and define  M
I P, I - X(X?X)-1X?.
27An Intercept Only Model
28The Intercept Only Model 2
The model becomes Â
29The P Matrix in This Case
30The M Matrix
31Response Surface Models Linear vs Quadratic
32Response Surface Models The Sign of the Betas