Title: Lecture 3: Regressions Chapters 15
1Lecture 3 Regressions (Chapters 1-5)
2Steps in empirical economic analysis
- Example1 the effect of job training on
productivity - Write down an empirical model
- Wage?0 ?1educ ?2exper ?3trainingu
- where ?0 ,?1 ,?2 ,?3 are parameters
- u is the error term.
- Construct hypotheses of interest
- (e.g. job training would increase wages)
- H0 ?30
- H1 ?3gt0
3Steps in empirical economic analysis
- Formation of questions of interest
- Based on economic model or based on common sense
- Write down an empirical model
- Construct hypothesis of interest based on the
empirical model
4- Example2 Economic model of Crime
- Economic model cost-benefit analysis of an
individuals participation in crime - Cost current wages in legal employment prob. of
being caught/convicted average sentence length - Benefit wage for criminal activity
- Write down an empirical model
- Crime?0 ?1wage ?2freqarr ?3freqconv
?4avesen u - Construct hypotheses of interest (e.g. higher
wage in legal employment reduces crime) - H0 ?10 H1 ?1lt0
5Data structure
- Cross-sectional data
- By individuals (WAGE1.DTA)
- By countries
- Time-series data
- Annually (GDP data)
- Weekly (money supply data)
- Daily (stock price)
- Pooled cross sections
- Combine 2 cross sectional data sets (t1.4)
- Panel/longitudinal data (t1.5)
- Trace a given set of cross sections over time
6(No Transcript)
7(No Transcript)
8Ceteris Paribus- other things being equal
- Most of economic questions are ceteris paribus by
nature. - Ex1 demand curve Holding other factors fixed,
quantity demanded increases with price - Ex2 policy analysis about job training and
productivity - Need to control for enough many covariates to
answer questions about causal effects of x on y. - How many is many enough?
- Issues of omitted variable instrumental
variable methods
9Example (1.4)
- Estimate the return to education
- If a person is randomly chosen from the
population and given another year of education,
by how much will his or her wage increase? - log(wage) ?0 ?1educ?2 exper ?3exper2u
- Parameter of interest ?1
- But we dont observe ability or taste for
education (problems of omitting variables) - We will resolve this problem using instrumental
method (chapter 15).
10Simple Regression Models
- Simple linear model
- y ?0 ?1xu
- where y dependent variable
- x independent/explanatory variable or the
regressor - u error term or disturbance
- ?0 intercept parameter
- ?1 slope parameter (or marginal effect of x
on y)
11- Example (log wage regressions are linear in years
of education) - log(wage) ?0 ?1educu
- where ?1 measures the return to an additional
years of schooling, holding all other factors
fixed.
12- Consider a simple linear regression model
- y ?0 ?1xu
- Use the average value of x to predict the average
value of y - Ey E?0 ?1xu
- ?0 ?1 Ex Eu
- Zero mean assumption Eu0
- We got
- Ey ?0 ?1 Ex
13- Use x to predict the average value of y (if we
know x) take expectations conditional on x - Eyx E?0 ?1xu x
- ?0 ?1x Eu x
- Zero conditional mean assumption Eux0
- So we get Eyx ?0 ?1x
- called the population regression function
(Fig2.1)
14Zero conditional mean assumption
- Eux0 the expected (or average) value of error
term is zero, for any slice of the population
described by the value of x. - Example no matter how many yrs of schooling, the
error term must equal zero. We say schooling and
the error term are uncorrelated denoted by - Cov(x,u)Exu0
- This is a strong assumption
- In the wage regression model, what if the
unobserved ability (contained in u) is correlated
education (x)? The zero conditional mean doesnt
hold in general. We study this simple case first
although it often doesnt hold.
15Derive the OLS Estimators
- The error term must satisfy 2 conditions
- Zero mean Eu0
- Zero conditional mean Exu0
- But we don't observe the error term u. In terms
of variables that we can observe (e.g. x and y) - Zero mean Ey -?0- ?1x0
- Zero conditional mean Ex(y -?0- ?1x)0
- The sample counter part
- Zero mean
- Zero conditional mean
- Two equations and two unknowns
16- Using the last two equalities, we derive the OLS
estimators for parameters ?0 and ?1 - Require x has enough variation (Fig 2.3)
17Use OLS estimates for prediction
- Sample regression function
- Fitted value
- Residual diff. between actual y and fitted
value - Sum of squared residuals
18Goodness of Fit
- Total sum of square (SST)
- explained sum of square (SSE)
- residual sum of square (SSR)
- R-squared SSE/SST 1-SSR/SST
19Examples and STATA recitation
- Example 2.3 fig 2.5 (CEOSAL1.DTA)
- Explain CEO's salary using the return on equity
- Can you interpret the results?
- Example 2.4 (WAGE1.DTA)
- Estimate return to education
- Can you interpret the results?
20Properties of OLS Estimators (1)
- Unbiasedness
- Required conditions
- Linear in parameters
- Random sampling
- Enough variations in regressor x
- Zero conditional mean
21Properties of OLS Estimators (2)
- Best Linear Unbiased Estimator (BLUE)
- The best the most efficient i.e. The variances
of the OLS estimators are the smallest among all
linear estimators - Also called the Gauss-Markov theorem
22- Required conditions for BLUE
- Linear in parameters
- Random sampling
- Enough variations in regressor x
- Zero conditional mean
- Constant variance the variance of the error
term does not varying with regressor x - The above assumptions are Gauss-Markov
assumptions.
23- Assumption 5 is a very strong assumption
- Fig 2.8 (case of constant variance)
- Fig 2.9 (cased of heteroskedasticity)
24Variance of OLS Estimators in Simple Linear
Regressions (1)
- By the assumption 5, we have
- Var(yx) Var(?0 ?1xu x)
- Var(ux) ?2
- We call ?2 the error variance and call ? the
standard deviation of the error term. - An unbiased estimator of the error variance
- We call the standard error of the
regression.
25Variance of OLS Estimators (2)
- Variance of the OLS estimator
- Intuition when there are more variations in x,
the OLS estimate will be more accurate. - But we dont know ?, replace ? with standard
error of regression - The standard error of the slope estimator is
26Examples (Problem 2.7)
- What is the standard error of the OLS estimator
for the slope?
27Functional forms
- Linear models
- Whenever we can transform a nonlinear model to a
linear one, we should do so and apply
Gauss-Markov theorem. - Example1 log wage regression
- Nonlinear models
28Multiple Regression Analysis Estimation
- Motivation (examples)
- other things being equal-- explaining the
effect of per student spending on the avg test
scores - Avgscore?0?1expend?2avgincu
- Extended functional form suppose family
consumption is a quadratic function of family
income - Cons?0?1inc?2inc2u
29Interpretations of OLS Results
- Partial effect (marginal effect)
- Example1
- Example2
30STATA Recitation
- Determinants of College GPA
- Example 3.1, 3.4 (GPA1.dta)
- Explaining Arrest Record
- Example 3.5 (CRIME1.dta)
31Zero mean assumption
- Consider a multiple linear regression model
- y ?0 ?1x1 ?2x2 ?kxk u
- Use the average value of x to predict the average
value of y - Ey E?0 ?1x1 ?2x2 ?kxk u
- ?0 ?1 Ex1 ?2Ex2 ?kExk Eu
- Zero mean assumption Eu0
- We got
- Ey ?0 ?1 Ex1 ?2Ex2 ?kExk
32Zero conditional mean condition
- Use x to predict the average value of y that
is, take expectations conditional on x - Eyx E?0 ?1x1 ?2x2 ?kxk u x
- ?0 ?1x1 ?2x2 ?kxk Eu x1,x2,,xk
- Zero conditional mean assumption
- Eu x1,x2,,xk 0
- So we get Eyx ?0 ?1x1 ?2x2 ?kxk,
- the population regression function
33Zero conditional mean assumption
- Eux1,x2,,xk0 the expected (or average) value
of error term is zero, for any slice of the
population described by the values of the
regressors. - Example no matter how many yrs of schooling or
your gender or work experience, the error term
must equal zero. We say schooling, gender, and
work experience are all uncorrelated with the
error term denoted by - Cov(x1,u)Ex1u0
- Cov(x2,u)Ex2u0
-
- Cov(xk,u)Exku0
34Derive the OLS Estimators
- The error term must satisfy 2 conditions
- Zero mean Eu0
- Zero conditional mean Exju0 for j1,,k
- Rewrite in terms of parameters of interest and
observed variables (e.g. x and y) - Ey -?0- ?1x1- ?2x2-- ?kxk 0
- Exj(y -?0- ?1x1-- ?kxk )0 for j1,,k
- The sample counter part
- We have k1 equations and k1 unknowns
35- Consider a case where only k2 explanatory
variables - Using those k13 equalities, we can derive the
OLS estimators for parameters ?0, ?1, ?2
36Comparison of Simple and Multiple Regression
Estimators
- Wrong model - suppose we omit x2 using simple
regression of y on x1 - True model now we include x2 using multiple
regression of y on x1 and x2 - We can show that a simple relationship ()
37Example (STATA Practice)
- Determination of College GPA (GPA1.dta)
- Suppose that the true model is
- Consequence of omitting an important variable
- Regress colGPA on ACT (ignoring ACT) verify ()
- Regress ACT on hsGPA (ignoring colGPA).Can you
verify ()?
38- Intuition about the OLS estimator formula
- Regress ACT on hsGPA
- Get the residual (the part of ACT that
cannot be explained by hsGPA) - Regress colGPA on
39Goodness of Fit
- Total sum of square (SST)
- explained sum of square (SSE)
- residual sum of square (SSR)
- R-squared SSE/SST 1-SSR/SST
40Properties of OLS Estimators (1)
- Unbiasedness
- Required conditions
- Linear in parameters
- Random sampling
- Enough variations in each regressor x1,,xk
- Zero conditional mean
41Omitted variable biases
- Example (estimate the return to education)
- True model
- wage ?0 ?educeduc ?abilabilu
- Let be the estimators of
from regressing wage on educ and abil,
respectively. We know both are unbiased
estimators. - Incomplete model
- wage ?0 ?educeduc v
- where v ?abilabilu. Let
be the estimators from regressing wage on educ,
ignoring abil. They are biased.
42- We have shown that
- We say that omission of the ability variable lead
to an overstatement of the return to education.
Or, say we have a positive bias.
43Important Fact about Omitted Variable Biases
- Bias in when x2 is omitted
- Example (suppose we dont observed povrate)
- avescore ?0 ?1expend ?2povrateu
44Omitted variable bias more general cases
- Example
- wage ?0 ?1educ ?2exper?3abilu
- Suppose we omit abil.
- Can you predict the direction of bias in ?1 when
we omit abil? - Its hard to obtain a clear direction of bias in
because educ, exper, and abil are
pairwise correlated.
45Properties of OLS Estimators (2)
- Best Linear Unbiased Estimator (BLUE)
- The best the most efficient i.e. The variances
of the OLS estimators are the smallest among all
linear estimators - Also called the Gauss-Markov theorem
46- Required conditions for BLUE
- Linear in parameters
- Random sampling
- Enough variations in each regressor
- Zero conditional mean
- Constant variance the variance of the error
term does not varying with regressors - The above assumptions are Gauss-Markov
assumptions.
47Variance of OLS Estimators in Mutiple Linear
Regressions (1)
- By the assumption 5, we have
- Var(yx) Var(?0 ?1x ?2x u x)
- Var(ux) ?2
- We call ?2 the error variance and call ? the
standard deviation of the error term. - An unbiased estimator of the error variance
- We call the standard error of the
regression.
48Variance of OLS Estimators (2)
- Variance of the OLS estimator
- If there are more variations in x1 that cannot be
explained by x2, the estimate is more accurate. - But we dont know ?, replace ? with standard
error of regression - The standard error of the slope estimator is
49- Note that
- Thus we also write
50Multicollinearity
- If x1 and x2 are perfectly correlated (so R121),
then - Example
- In estimating the effect of various school
expenditure categories (e.g. teacher salaries,
instructional material, athletics,) on student
performance. - But wealthier schools tend to spend more on
everything. I.e. the covariates are highly
correlated.
51- Solutions to multicollinearity
- Collecting more data to get more variation in
covariates - Drop covariates from the model (but may lead to
biased results) - Lumping variables togethers (e.g. all expenditure
categories lumped into one variable)
52Variances in Misspecified Models
- True model
- y ?0 ?1x1 ?2x2u
- Incomplete model
- y ?0 ?1x1 v
53Hypothesis Testing (Ch4)
- In small sample, the normalized slope estimator
follow Student-t distribution - By Central Limit Theorem, the normalized slope
estimator is standard normal in large sample
54Examples
- Example 4.1 (WAGE1.dta)
- Is the return to work experience positive?
- Construct hypotheses
- Small sample (using student-t table)
- Large sample (using standard normal table)
55- Example 4.2 (MEAP93.dta)
- Does School Size Matter?
- Math10 ?0 ?1teachSal ?2staff?3enrollu
- Construct hypotheses
- Small sample (using student-t table)
- Large sample (using standard normal table)
56- Changing functional form (taking log)
- Math10 ?0
- ?1log(teachSal) ?2log(staff)?3log(enroll)u
- ?3 -1.3 suggests that every one percent decrease
in enrollment will decrease the average math
score by 1.3 points. - Construct hypotheses
- Small sample (using student-t table)
57Computing and Using P-Values(Appendix C p 794)
- Example (fig C.7) Suppose that we have a
t-statistics 1.52 for the one-sided alternative
?gt0. Then - p-valuePrTgt1.52 ?01-?(1.52).065 gt.05
- So we cannot reject the null hypothesis.
- Example (fig C.8) Suppose that we have a
t-statistics -2.13 for the one-sided
alternative ?lt0. Then - p-valuePrTlt-2.13 ?0?(-2.13).025 lt.05
- So we reject the null hypothesis.
58- Example (two-sided) Suppose that we have
t-statistics ?1.52 for the both sides for
alternative hypothesis ??0. Then - p-value PrTgt1.52 or Tlt-1.52 ?0
- 2 PrTgt1.52 ?0
- 2?(1.52).13 gt.05
- So we cannot reject the null hypothesis.
59Testing one-sided alternatives
- H0 ?10
- H1 ?1gt0 (fig 4.2 rejection region on the right)
- The rejection rules (for the 5 percent
significance level) - Use t statistics
- Use confidence interval
- p-value on one side lt.05
- Example 4.1 (WAGE1)
60Testing one-sided alternatives
- H0 ?10
- H1 ?1lt0 (fig 4.3 rejection region on the left)
- The rejection rules (for the 5 percent
significance level) - Use t statistics
- Use confidence interval
- p-value on one side lt.05
- Example 4.2 (MEAP93)
61Testing 2-sided alternatives(testing for
significance)
- H0 ?10
- H1 ?1?0 (fig 4.4 rejection regions on both
sides) - The rejection rules (for the 5 percent
significance level) - Use t statistics
- Use confidence interval
- p-value on one side lt.05
- Example 4.3 (GPA1)
62Testing for other hypothesis about coefficients
- H0 ?11
- H1 ?1gt1
- Use t statistics
- Use confidence interval
- p-value on one side lt.05
- Example 4.4 (CAMPUS) Example 4.5 (Homework)
63Testing Hypothesis about a Single Linear
Combination of ?
- Example (compare the returns to education at
junior colleges and four-yr colleges) - log(wage) ?0 ?1jc ?2univ ?3exper u
- H0 ?1 ?2
- H1 ?1 lt ?2
64- where
- STATA (TWOYEAR) After running the regression,
type - test jcuniv
- test univ1
- STATA reports p-value based on F-test (see below).
65Testing Multiple Linear Restrictions the F Test
- Test a group of variables has no effect on y.
- Example (athlete's salary)
- log(salary) ?0 ?1years ?2gamesyr ?3bavg
?4hrunsyr ?5rbisyr u - H0 ?3 0, ?4 0, ?5 0
- H1 H0 is not true
- (This is call a joint hypothesis test)
- We use F-test.
- In STATA (MLB1), this is very simple. After
running the regression - test bavg hrunsyr rbisyr
66Ideas
- Unrestricted model
- SSRur (sum of squared residuals)
- Rur2
- Restricted model (?3 0, ?4 0, ?5 0)
- SSRr
- Rr2
- Restrictions increase SSR. In the case where SSR
almost unchanged by restrictions, we can safely
say that the its all right to say ?3 0, ?4 0,
?5 0. - This suggests that we can use the difference in
SSR to test the hypothesis.
67Numerator degree of freedomq Denominator degree
of freedomn-k-1
- F-Statistics (or F-ratio)
- SSRr sum of squared residuals from restricted
model (?3 0, ?4 0, ?5 0) - SSRursum of squared residuals from unrestricted
model - qrestrictions (q3 in example)
- kcovariates in unrestricted model (k4 in
example) - F-statistics is always nonnegative because
- SSRr gt SSRur
68Rejection rules of F-test
- If Fgtcritical value, then we reject the null
hypothesis.(See Table G3 for critical values) . - P-valueP(FgtF) - see Fig 4.7
- STATA (MLB1 p156)
- Derive SSRr and SSRur
- Derive the degrees of freedom
- Derive the value of F-statistic
- Compared with the critical value
- Conclusion
69F-test is very general
- It can provide the same results as t-test (single
parameter test) - Example
- It can do a joint hypothesis test
- It can test the significance of the entire
regression model (multiple parameters test) - Example
70- It can test general linear restrictions (p162)
- log(price) ?0 ?1log(assess)?2log(lotsize)?3log
(sqrft) - ?4bdrms u
- H0 ?1 1, ?2 0, ?3 0, ?4 0
- Restricted model
- log(price) ?0 log(assess) u
- log(price)- log(assess) ?0 u
- Derive SSRr and SSRur
- Derive the degrees of freedom
- Derive the value of F-statistic
- Compared with the critical value
- Conclusion
71R-Squared Form of the F-Statistics
- Note that SSRSST(1-R2)
- We can rewrite the F-statistics as follows
72Reporting Regression Results
- Interpreting the estimated coefficient
- link to economic models
- reporting standard errors
- R-squared goodness-of-fit measure
- Summary in a table
- Example (4.10, p163)-homework
73OLS Asymptotics (Ch5)
- Large Sample Properties of OLS
- Consistency
- Asymptotic normal
- Recall the Small Sample Properties of OLS include
- Unbiasedness (Conditions 1-4)
- Gauss-Markov Theorem (BLUE) (Conditions 1-4 and
Condition 5- constant variance)
74Consistency
- Consistency is considered the minimum requirement
for an estimator. - An estimator is consistent if
- the distribution of the estimator becomes more
and more tightly distributed around the true
value as the sample size n grows. - As n tends to infinity, the distribution of the
estimator collapses to the point of the true
value. (fig 5.1)
75Consistency of OLS
- Consider a simple regression model
- y ?0 ?1xu
- The formula for the OLS estimator is
- where
76Conditions for consistency
77A case of inconsistency
- When x and u are correlated (I.e. Cov(x,u) does
not equal zero), we have problems of
inconsistency. - Example
- in wage regression, what if educ and u
(containing unobserved ability)? The OLS
estimators are inconsistent. - I.e. even in large sample, the distribution of
the coefficients will not collapse to the true
value. This estimator is not very useful. - Asymptotic bias Cov(x,u)/Var(x)
78Example 5.1
- Housing pries and distance from an incinerator
- If the incinerator depresses house price, its
coefficient should be positive. - If higher quality of house increases the house
price, the coefficient of quality should be
positive. - When quality of house is not fully measured or
observed, the effect of distance from an
incinerator would overstate the effect of the
incinerator on housing price because
79Large Sample Inference
- Consistency of an estimator is an important
property but it does not provide information
about the accuracy of the estimator. - In sample sample, we have had Gauss-Markov
theorem to tell us the degree of accuracy (I.e.
variance) of an OLS estimator. In fact the OLS
estimator is the most accurate one among linear
unbiased estimators (I.e. BLUE). - In large sample, we can do even better! The
distribution of the OLS estimator look almost
like a normal distribution. So we can use
standard normal table to do hypothesis testing.
80Theorem 5.2 (Asymptotic Normality of OLS)
- Under the Gauss-Markov Assumptions
- OLS estimator is asymptotically normally
distributed
81- Replace the parameter ?2 by a consistent
estimator of ?2, the distribution of the
estimator is asymptotic normal - In small sample, the above normalization follows
the Student-t tn-k-1 by any sample size. - The s.e. of the OLS estimator shrinks at a rate
of the inverse of the square root of the sample
size.
82Example 5.2
- Standard error in a birth weight equation (BWGHT)
- log(birthweight)
- ?0 ?1cigs?2log(fincome)u
- Using the first half of the data (n694), the
s.e.(cigs) .0013 - Using the full sample (1388), the
s.e.(cigs).00086 - .0013/.00086 is almost equal to square root of
1388/694.