Title: PS 233
1PS 233
- Intermediate Statistical Methods
- Lecture 7
- Assumptions of the OLS Estimator
2Categories of Assumptions
- Total number of assumptions necessary depends
upon how you count - Matrix vs. Scalar
- Three categories of assumptions
- Assumptions to calculate b
- Assumptions to show b is unbiased
- Assumptions to calculate variance of b
3Categories of Assumptions
- Note that these sets of assumptions follow in
sequence - Each step builds on the previous results
- Thus if an assumption is necessary to calculate
b, it is also necessary to show that b is
unbiased, etc. - If an assumption is only necessary for a later
step, earlier results are unaffected
4Assumptions to Calculate bX Varies
- Every X takes on at least two distinct values
- Recall our bivariate estimator
- If X does not vary, then the denominator is 0
5Assumptions to Calculate bX Varies
- If X does not vary then our data points become a
vertical line. - The slope of a vertical line is undefined
- Conceptually, if we do not observe variation in
X, we cannot draw conclusions about how Y varies
with X
6Assumptions to Calculate bXs are Not Perfectly
Colinear
- Matrix (XX)-1 exists
- Recall our multivariate estimator
- This means that (XX) must be of full rank
- No columns can be linearly related
7Assumptions to Calculate bXs are Not Perfectly
Colinear
- If one X is a perfect linear function of another
then OLS cannot distinguish among their effects - If an X has no variation independent of other
Xs, OLS has no information to estimate its
effect. - This is more general statement of previous
assumption X varies.
8Assumptions to Show b is Unbiased Xs are fixed
- Conceptually, assumption implies that we could
repeat an experiment in which we held X
constant and observed new values for e, and Y - This assumption is necessary to allow us to
calculate a distribution of b - Knowing the distribution of b is essential for
calculating its mean.
9Assumptions to Show b is Unbiased Xs are fixed
- Without knowing the mean of b, we cannot know
whether E(b)B - In addition, without knowing the mean of b or its
distribution, we cannot know the variance of b - In practical terms, we must assume that the
independent variables are measured without error.
10Assumptions to Show that b is Unbiased E(U)0
- Recall our equation for b
- X is fixed and non-zero
- Thus if
11Assumptions to Show that b is Unbiased E(U)0
- Conceptually, this assumption means that we
believe that our theory - described in the
equation YXBu accurately represents our
dependent variable. - If E(U) is not equal to zero then we have the
wrong equation an omitted variable
12Assumptions to Show that b is Unbiased E(U)0
- Note that the assumption E(U)0 implies that
E(U1)E(U2)E(Un)0 - Therefore, the assumption E(U)0 also implies
that U is independent of the values of X - That is, E(Ut,Xtk)0
13Calculating the Variance of b -Degrees of Freedom
- We must have more cases than we have Xs
- In other words, NgtK
- Recall that our estimator of b is the result of
numerous summation operations - Each summation has N pieces of information about
X in it, but
14Calculating the Variance of b Degrees of Freedom
- Not all n pieces of information about X in the
summations are independent - Take the calculation
- Once we calculate X-bar, then for the final
observation Xn, we know what that observation of
X must be, given X-bar
15Calculating the Variance of bDegrees of Freedom
- Thus the degrees of freedom for the summation
is n-1 - We lose one piece of information in estimating
the parameter X-bar. - For each parameter, we lose one more piece of
independent information because the parameters
depend on the values in the data.
16Calculating the Variance of bSufficient Degrees
of Freedom
- Dividing by the degrees of freedom was necessary
to make an unbiased estimate of the variance of b - Recall the formulas
- If KgtN then the variance of b is undefined
17Calculating the Variance of bSufficient Degrees
of Freedom
- Conceptually, this means that the values of the b
vector are overdetermined - Hypothesis tests become impossible
- STATA will not estimate b, but one could caculate
a b by hand, though it would be useless.
18Calculating the Variance of b Error Variance is
Constant
- More specifically we assume that
- To calculate the variance of b we factored su2
out of a summation across the N datapoints. - If su2 is not constant then our numerator for sb2
is wrong
19Calculating the Variance of b Error Variance is
Constant
- Conceptually, this assumption states that each of
our observations is equally reliable as a piece
of information for estimating b - If E(u) is not equal to su2 then some data points
hold more information than others. - This is known as heteroskedasticity
20Calculating the Variance of bThe Independence
of Errors
- More specifically, we assume that
- Proof of minimum variance assumed that matrix UU
could be factored out of the variance of b as
su2I. - All elements of UU matrix off the diagonal are
assumed to be 0
21Calculating the Variance of bThe Independence
of Errors
- If this is not true, then OLS is not BLUE and our
equation for the variance of b is wrong usually
too small - Conceptually, this assumption means that the only
systematic factor predicting Y is X, u is not
systematic
22Calculating the Variance of bThe Independence
of Errors
- If X is not the only systematic cause of Y then
we are not using all the information available to
predict Y - If U is systematic over time, then we should
model process to predict Y - This problem is known as Autocorrelation
23Summary of Assumptions
- Rank of XK
- X is non-stochastic
- E(U)E(Ut,Xtk)0
- NgtK
- E(UU) su2I