Title: Analysis of Cross Section and Panel Data
1Analysis of Cross Section and Panel Data
- Yan Zhang
- School of Economics, Fudan University
- CCER, Fudan University
2Introductory Econometrics A Modern
Approach
- Yan Zhang
- School of Economics, Fudan University
- CCER, Fudan University
3Analysis of Cross Section and Panel Data
- Part 1. Regression Analysis on Cross Sectional
Data
4Chap 2. The Simple Regression ModelPractice
for learning multiple Regression
- Bivariate linear regression model
- the slope parameter in the relationship
between y and x holding the other factors in u
fixed it is of primary interest in applied
economics. - the intercept parameter, also has its uses,
although it is rarely central to an analysis.
5More Discussion
- A one-unit change in x has the same effect
on y, regardless of the initial value of x. - Increasing returns wage-education (f. form)
- Can we draw ceteris paribus conclusions about how
x affects y from a random sample of data, when we
are ignoring all the other factors? - Only if we make an assumption restricting how the
unobservable random variable u is related to the
explanatory variable x
6Classical Regression Assumptions
-
- Feasible assumption if the intercept term is
included - Linearly uncorrelated zero conditional
expectation - Meaning
-
- ???
- PRF (Population Regression Function) sth. fixed
but unknown
7OLS
- Minimize uu
- sample regression function (SRF)
- The point is always on the OLS regression
line.
??????
PRF
8OLS
-
- Coefficient of determination
- the fraction of the sample variation in y that is
explained by x. - the square of the sample correlation coefficient
between and - Low R-squareds
9Units of Measurement
- If one of the dependent variables is multiplied
by the constant cwhich means each value in the
sample is multiplied by cthen the OLS intercept
and slope estimates are also multiplied by c. - If one of the independent variables is divided or
multiplied by some nonzero constant, c, then its
OLS slope coefficient is also multiplied or
divided by c respectively. - The goodness-of-fit of the model, R-squareds,
should not depend on the units of measurement of
our variables.
10Function Form
- Linear Nonlinear
- Logarithmic dependent variable
- A
- Percentage change in y, semi-elasticity
- an increasing return to edu.
- Other nonlinearity diploma effect
- Bi-Logarithmic
- A
- a
- Constant elasticity
- Change of units of measurement
- P45, error b0b0log(c1)-b1log(c2)
- Bi-Logarithmic
- A
- a
- Constant elasticity
- Change of units of measurement
- P45, error
- b0b0log(c1)-b1log(c2)
- Be proficient at interpreting the coef.
11Unbiasedness of OLS Estimators
- Statistical properties of OLS
- ????????????????OLS?? ?????
- Assumptions
- Linear in parameters (f. form advanced methods)
- Random sampling (time series data nonrandom
sampling) - Zero conditional mean (unbiased biased
spurious cor) - Sample Variation in the independent variables
(colinearity) - Theorem (Unbiasedness)
- Under the four assumptions above, we have
12Variance of OLS Estimators
- ?????? ???,??? ???? ???
- Assumptions
- Homoskedasticity
- Error variance
- A larger means that the distribution of the
unobservables affecting y is more spread out. - Theorem (Sampling variance of OLS estimators)
- Under the five assumptions above
-
13Variance of y given x
- Conditional mean
- and variance of y
- Heteroskedasticity
-
14What does depend on?
- More variation in the unobservables affecting y
makes it more difficult to precisely estimate - The more spread out is the sample of xi -s, the
easier it is to find the relationship between - E(y x) and x
- As the sample size increases, so does the total
variation in the xi. Therefore, a larger sample
size results in a smaller variance of the
estimator
15Estimating Error Variance
- Errors (Disturbances) and Residuals
- Errors , population
- Residuals , estimated f.
- Theorem (The unbiased estimator of )
- Under the five assumptions above, we have
- standard error of the regression (SER)
- Estimating the standard deviation in y after the
effect of x has been taken out. - Standard Error of
16Regression through the Origin
- Regression through the Origin
- Pass through
- E.g. income tax revenue income
- The estimator of OLS
- only if 0
- if the intercept 0, then is a
biased estimator of
17Chap 3. Multiple Regression AnalysisEstimation
- Advantages of multiple regression analysis
- build better models for predicting the dependent
variable. - E.g.
- generalize functional form.
- Marginal propensity to consume
- Be more amenable to ceteris paribus analysis
- Chap 3.2
- Key assumption
- Implication other factors affecting wage are not
related on average to educ and exper. - Multiple linear regression model
the ceteris paribus effect of xj on y
18Ordinary Least Square Estimator
- SPF
- OLS
- Minimize
- F.O.C
- ceteris paribus interpretations
- Holding fixed, then
- Thus, we have controlled for the variables
when estimating the effect of x1 on y.
19Holding Other Factors Fixed
- The power of multiple regression analysis is that
it provides this ceteris paribus interpretation
even though the data have not been collected in a
ceteris paribus fashion. - it allows us to do in non-experimental
environments what natural scientists are able to
do in a controlled laboratory setting keep other
factors fixed.
20OLS and Ceteris Paribus Effects
- Step of OLS
- (1) the OLS residuals from a multiple
regression - of x1 on
- (2) the OLS estimator from a simple
regression - of y on
- measures the effect of x1 on y after x2,,
xk have been partialled or netted out. - Two special cases in which the simple regression
of y on x1 will produce the same OLS estimate on
x1 as the regression of y on x1 and x2.
21Goodness-of-fit
- also equal the squared correlation coef.
between the actual and the fitted values of y. - R never decreases, and it usually increases when
another independent variable is added to a
regression. - The factor that should determine whether an
explanatory variable belongs in a model is
whether the explanatory variable has a nonzero
partial effect on y in the population.
22Regression through the origin
- the properties of OLS derived earlier no longer
hold for regression through the origin. - the OLS residuals no longer have a zero sample
average. - can actually be negative.
- to calculate it as the squared correlation
coefficient - if the intercept in the population model is
different from zero, then the OLS estimators of
the slope parameters will be biased.
23The Expectation of OLS Estimator
- Assumptions(???????????????)
- Linear in parameters
- Random sampling
- Zero conditional mean
- No perfect co-linearity
- none of the independent variables is constant
- and there are no exact linear relationships among
the independent variables - Theorem (Unbiasedness)
- Under the four assumptions above, we have
rank (X)K
24Notice 1 Zero conditional mean
- Exogenous Endogenous
- Misspecification of function form (Chap 9)
- Omitting the quadratic term
- The level or log of variable
- Omitting important factors that correlated with
any independent v. - ???????????????,?????????,??????
- Measurement Error (Chap 15, IV)
- Simultaneously determining one or more x-s with y
(Chap 16, ?????)
25Omitted Variable Bias The Simple Case
- ProblemExcluding a relevant variable or
Under-specifying the model(???????????(??)??????) - Omitted Variable Bias (misspecification analysis)
- The true population model
- The underspecified OLS line
- The expectation of
- The Omitted variable bias
??3.2???x1?x2??
26Omitted Variable Bias Nonexistence
- Two cases where is unbiased
- The true population model
-
- is the sample covariance between x1 and x2
over the sample variance of x1 - If , then
?????x2??,?????????,?x2??????????????? - Summary of Omitted Variable Bias
- The expectation of
- The Omitted variable bias
27The Size of Omitted Variable Bias
- Direction Size
- A small bias of either sign need not be a cause
for concern. - Unknown Some idea
- we usually have a pretty good idea about the
direction of the partial effect of x2 on y, that
is, the sign of - in many cases we can make an educated guess about
whether x1 and x2 are positively or negatively
correlated. - E.g. (Upward/downward Bias biased toward zero)
??!
28Omitted Variable Bias More General Cases
-
- Suppose x2 and x3 are uncorrelated, but that x1
is correlated with x3. - Both and will normally be biased. The
only exception to this is when x1 and x2 are also
uncorrelated. - Difficult to obtain the direction of the bias in
and - Approximation if x1 and x2 are also uncor.
29Notice 2 No Perfect Collinearity
- An assumption only about x-s, nothing about the
relationship between u and x-s - Assumption MLR.4 does allow the independent
variables to be correlated they just cannot be
perfectly correlated. Ceteris Paribus
effect - If we did not allow for any correlation among the
independent variables, then multiple regression
would not be very useful for econometric
analysis. - Significance
30Cases of Perfect Collinearity
- When can independent variables be perfectly
collinear softwaresingular - Nonlinear functions of the same variable is not
an exact linear f. - Not to include the same explanatory variable
measured in different units in the same
regression equation. - More subtle ways
- one independent variable can be expressed as an
exact linear function of some or all of the other
independent variables. Drop it - Key
31Notice 3 Unbiase
- the meaning of unbiasedness
- an estimate cannot be unbiased an estimate is a
fixed number, obtained from a particular sample,
which usually is not equal to the population
parameter. - When we say that OLS is unbiased under
Assumptions MLR.1 through MLR.4, we mean that the
procedure by which the OLS estimates are obtained
is unbiased when we view the procedure as being
applied across all possible random samples.
32Notice 4 Over-Specification
- Inclusion of an irrelevant variable or
over-specifying the model - does not affect the unbiasedness of the OLS
estimators. - including irrelevant variables can have
undesirable effects on the variances of the OLS
estimators.
33Variance of The OLS Estimators
- Adding Assumptions
- Homoskedasticity
- Error variance
- A larger means that the distribution of the
unobservables affecting y is more spread out. - Gauss-Markov assumptions (for cross-sectional
regression) Assumption 1-5 - Theorem (Sampling variance of OLS estimators)
- Under the five assumptions above
-
34More about
- The stastical properties of y on x(x1, x2, ,
xk) - Error variance
- only one way to reduce the error variance to add
more explanatory variablesnot always possible
and desirable - The total sample variations in xj SSTj
- Increase the sample size
-
35Multi-collinearity(?????)
- The linear relationships among the independent v.
- ???????xj?????(????)
- If k2
- the proportion of the total variation in
xj that can be explained by the other independent
variables -
-
- High (but not perfect)
correlation between two or more of the in
dependent variables is called multicollinearity.
36Micro-numerosity problem of small sample size
- High
- Low SSTj
- one thing is clear everything else being equal,
for estimating j, it is better to have less
correlation between xj and the other x-s. - How to solve the multicollinearity?
- Increase sample size
- Dropping some v.? ???????????????,??????
37Notice The influence of multicollinearity
- A high degree of correlation between certain
independent variables can be irrelevant as to how
well we can estimate other parameters in the
model. - E.g.
- Importance for economistscontrolling v.
????
38Variances in Misspecified Models
39Whether or Not to Include x2 Two Favorable
Reasons
- The choice of whether or not to include a
particular variable in a regression model can be
made by analyzing the tradeoff between bias and
variance.. - However, when2 0, there are two favorable
reasons for including x2 in the model. - any bias in does not shrink as the sample
size grows - The variance of estimators both shrink to zero as
n increase - Therefor, the multicollinearity induced by adding
x2 becomes less important as the sample size
grows. In large samples, we would prefer
40Estimating Standard Errors of the OLS
Estimators
????
41EFFICIENCY OF OLS THE GAUSS-MARKOV THEOREM
- BLUE
- Best smallest variance
- linear
- unbiased
????(1)????????????????(2)??G-M????????,?BLUE???
???????????(???)????????????,???????????
42Classical Linear Model AssumptionsInference
43???????????
- Jeffrey M. Wooldridge, Introductory
EconometricsA Modern Approach, Chap 2-3.