Analysis of Cross Section and Panel Data - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Analysis of Cross Section and Panel Data

Description:

Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University Introductory Econometrics A Modern Approach Yan ... – PowerPoint PPT presentation

Number of Views:242
Avg rating:3.0/5.0
Slides: 44
Provided by: Zhan72
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Cross Section and Panel Data


1
Analysis of Cross Section and Panel Data
  • Yan Zhang
  • School of Economics, Fudan University
  • CCER, Fudan University

2
Introductory Econometrics A Modern
Approach
  • Yan Zhang
  • School of Economics, Fudan University
  • CCER, Fudan University

3
Analysis of Cross Section and Panel Data
  • Part 1. Regression Analysis on Cross Sectional
    Data

4
Chap 2. The Simple Regression ModelPractice
for learning multiple Regression
  • Bivariate linear regression model
  • the slope parameter in the relationship
    between y and x holding the other factors in u
    fixed it is of primary interest in applied
    economics.
  • the intercept parameter, also has its uses,
    although it is rarely central to an analysis.

5
More Discussion
  • A one-unit change in x has the same effect
    on y, regardless of the initial value of x.
  • Increasing returns wage-education (f. form)
  • Can we draw ceteris paribus conclusions about how
    x affects y from a random sample of data, when we
    are ignoring all the other factors?
  • Only if we make an assumption restricting how the
    unobservable random variable u is related to the
    explanatory variable x

6
Classical Regression Assumptions
  • Feasible assumption if the intercept term is
    included
  • Linearly uncorrelated zero conditional
    expectation
  • Meaning
  • ???
  • PRF (Population Regression Function) sth. fixed
    but unknown

7
OLS
  • Minimize uu
  • sample regression function (SRF)
  • The point is always on the OLS regression
    line.

??????
PRF
8
OLS
  • Coefficient of determination
  • the fraction of the sample variation in y that is
    explained by x.
  • the square of the sample correlation coefficient
    between and
  • Low R-squareds

9
Units of Measurement
  • If one of the dependent variables is multiplied
    by the constant cwhich means each value in the
    sample is multiplied by cthen the OLS intercept
    and slope estimates are also multiplied by c.
  • If one of the independent variables is divided or
    multiplied by some nonzero constant, c, then its
    OLS slope coefficient is also multiplied or
    divided by c respectively.
  • The goodness-of-fit of the model, R-squareds,
    should not depend on the units of measurement of
    our variables.

10
Function Form
  • Linear Nonlinear
  • Logarithmic dependent variable
  • A
  • Percentage change in y, semi-elasticity
  • an increasing return to edu.
  • Other nonlinearity diploma effect
  • Bi-Logarithmic
  • A
  • a
  • Constant elasticity
  • Change of units of measurement
  • P45, error b0b0log(c1)-b1log(c2)
  • Bi-Logarithmic
  • A
  • a
  • Constant elasticity
  • Change of units of measurement
  • P45, error
  • b0b0log(c1)-b1log(c2)
  • Be proficient at interpreting the coef.

11
Unbiasedness of OLS Estimators
  • Statistical properties of OLS
  • ????????????????OLS?? ?????
  • Assumptions
  • Linear in parameters (f. form advanced methods)
  • Random sampling (time series data nonrandom
    sampling)
  • Zero conditional mean (unbiased biased
    spurious cor)
  • Sample Variation in the independent variables
    (colinearity)
  • Theorem (Unbiasedness)
  • Under the four assumptions above, we have

12
Variance of OLS Estimators
  • ?????? ???,??? ???? ???
  • Assumptions
  • Homoskedasticity
  • Error variance
  • A larger means that the distribution of the
    unobservables affecting y is more spread out.
  • Theorem (Sampling variance of OLS estimators)
  • Under the five assumptions above

13
Variance of y given x
  • Conditional mean
  • and variance of y
  • Heteroskedasticity

14
What does depend on?
  • More variation in the unobservables affecting y
    makes it more difficult to precisely estimate
  • The more spread out is the sample of xi -s, the
    easier it is to find the relationship between
  • E(y x) and x
  • As the sample size increases, so does the total
    variation in the xi. Therefore, a larger sample
    size results in a smaller variance of the
    estimator

15
Estimating Error Variance
  • Errors (Disturbances) and Residuals
  • Errors , population
  • Residuals , estimated f.
  • Theorem (The unbiased estimator of )
  • Under the five assumptions above, we have
  • standard error of the regression (SER)
  • Estimating the standard deviation in y after the
    effect of x has been taken out.
  • Standard Error of

16
Regression through the Origin
  • Regression through the Origin
  • Pass through
  • E.g. income tax revenue income
  • The estimator of OLS
  • only if 0
  • if the intercept 0, then is a
    biased estimator of

17
Chap 3. Multiple Regression AnalysisEstimation
  • Advantages of multiple regression analysis
  • build better models for predicting the dependent
    variable.
  • E.g.
  • generalize functional form.
  • Marginal propensity to consume
  • Be more amenable to ceteris paribus analysis
  • Chap 3.2
  • Key assumption
  • Implication other factors affecting wage are not
    related on average to educ and exper.
  • Multiple linear regression model

the ceteris paribus effect of xj on y
18
Ordinary Least Square Estimator
  • SPF
  • OLS
  • Minimize
  • F.O.C
  • ceteris paribus interpretations
  • Holding fixed, then
  • Thus, we have controlled for the variables
    when estimating the effect of x1 on y.

19
Holding Other Factors Fixed
  • The power of multiple regression analysis is that
    it provides this ceteris paribus interpretation
    even though the data have not been collected in a
    ceteris paribus fashion.
  • it allows us to do in non-experimental
    environments what natural scientists are able to
    do in a controlled laboratory setting keep other
    factors fixed.

20
OLS and Ceteris Paribus Effects
  • Step of OLS
  • (1) the OLS residuals from a multiple
    regression
  • of x1 on
  • (2) the OLS estimator from a simple
    regression
  • of y on
  • measures the effect of x1 on y after x2,,
    xk have been partialled or netted out.
  • Two special cases in which the simple regression
    of y on x1 will produce the same OLS estimate on
    x1 as the regression of y on x1 and x2.

21
Goodness-of-fit
  • also equal the squared correlation coef.
    between the actual and the fitted values of y.
  • R never decreases, and it usually increases when
    another independent variable is added to a
    regression.
  • The factor that should determine whether an
    explanatory variable belongs in a model is
    whether the explanatory variable has a nonzero
    partial effect on y in the population.

22
Regression through the origin
  • the properties of OLS derived earlier no longer
    hold for regression through the origin.
  • the OLS residuals no longer have a zero sample
    average.
  • can actually be negative.
  • to calculate it as the squared correlation
    coefficient
  • if the intercept in the population model is
    different from zero, then the OLS estimators of
    the slope parameters will be biased.

23
The Expectation of OLS Estimator
  • Assumptions(???????????????)
  • Linear in parameters
  • Random sampling
  • Zero conditional mean
  • No perfect co-linearity
  • none of the independent variables is constant
  • and there are no exact linear relationships among
    the independent variables
  • Theorem (Unbiasedness)
  • Under the four assumptions above, we have

rank (X)K
24
Notice 1 Zero conditional mean
  • Exogenous Endogenous
  • Misspecification of function form (Chap 9)
  • Omitting the quadratic term
  • The level or log of variable
  • Omitting important factors that correlated with
    any independent v.
  • ???????????????,?????????,??????
  • Measurement Error (Chap 15, IV)
  • Simultaneously determining one or more x-s with y
    (Chap 16, ?????)

25
Omitted Variable Bias The Simple Case
  • ProblemExcluding a relevant variable or
    Under-specifying the model(???????????(??)??????)
  • Omitted Variable Bias (misspecification analysis)
  • The true population model
  • The underspecified OLS line
  • The expectation of
  • The Omitted variable bias

??3.2???x1?x2??
26
Omitted Variable Bias Nonexistence
  • Two cases where is unbiased
  • The true population model
  • is the sample covariance between x1 and x2
    over the sample variance of x1
  • If , then
    ?????x2??,?????????,?x2???????????????
  • Summary of Omitted Variable Bias
  • The expectation of
  • The Omitted variable bias

27
The Size of Omitted Variable Bias
  • Direction Size
  • A small bias of either sign need not be a cause
    for concern.
  • Unknown Some idea
  • we usually have a pretty good idea about the
    direction of the partial effect of x2 on y, that
    is, the sign of
  • in many cases we can make an educated guess about
    whether x1 and x2 are positively or negatively
    correlated.
  • E.g. (Upward/downward Bias biased toward zero)

??!
28
Omitted Variable Bias More General Cases
  • Suppose x2 and x3 are uncorrelated, but that x1
    is correlated with x3.
  • Both and will normally be biased. The
    only exception to this is when x1 and x2 are also
    uncorrelated.
  • Difficult to obtain the direction of the bias in
    and
  • Approximation if x1 and x2 are also uncor.

29
Notice 2 No Perfect Collinearity
  • An assumption only about x-s, nothing about the
    relationship between u and x-s
  • Assumption MLR.4 does allow the independent
    variables to be correlated they just cannot be
    perfectly correlated. Ceteris Paribus
    effect
  • If we did not allow for any correlation among the
    independent variables, then multiple regression
    would not be very useful for econometric
    analysis.
  • Significance

30
Cases of Perfect Collinearity
  • When can independent variables be perfectly
    collinear softwaresingular
  • Nonlinear functions of the same variable is not
    an exact linear f.
  • Not to include the same explanatory variable
    measured in different units in the same
    regression equation.
  • More subtle ways
  • one independent variable can be expressed as an
    exact linear function of some or all of the other
    independent variables. Drop it
  • Key

31
Notice 3 Unbiase
  • the meaning of unbiasedness
  • an estimate cannot be unbiased an estimate is a
    fixed number, obtained from a particular sample,
    which usually is not equal to the population
    parameter.
  • When we say that OLS is unbiased under
    Assumptions MLR.1 through MLR.4, we mean that the
    procedure by which the OLS estimates are obtained
    is unbiased when we view the procedure as being
    applied across all possible random samples.

32
Notice 4 Over-Specification
  • Inclusion of an irrelevant variable or
    over-specifying the model
  • does not affect the unbiasedness of the OLS
    estimators.
  • including irrelevant variables can have
    undesirable effects on the variances of the OLS
    estimators.

33
Variance of The OLS Estimators
  • Adding Assumptions
  • Homoskedasticity
  • Error variance
  • A larger means that the distribution of the
    unobservables affecting y is more spread out.
  • Gauss-Markov assumptions (for cross-sectional
    regression) Assumption 1-5
  • Theorem (Sampling variance of OLS estimators)
  • Under the five assumptions above

34
More about
  • The stastical properties of y on x(x1, x2, ,
    xk)
  • Error variance
  • only one way to reduce the error variance to add
    more explanatory variablesnot always possible
    and desirable
  • The total sample variations in xj SSTj
  • Increase the sample size

35
Multi-collinearity(?????)
  • The linear relationships among the independent v.
  • ???????xj?????(????)
  • If k2
  • the proportion of the total variation in
    xj that can be explained by the other independent
    variables
  • High (but not perfect)
    correlation between two or more of the in
    dependent variables is called multicollinearity.

36
Micro-numerosity problem of small sample size
  • High
  • Low SSTj
  • one thing is clear everything else being equal,
    for estimating j, it is better to have less
    correlation between xj and the other x-s.
  • How to solve the multicollinearity?
  • Increase sample size
  • Dropping some v.? ???????????????,??????

37
Notice The influence of multicollinearity
  • A high degree of correlation between certain
    independent variables can be irrelevant as to how
    well we can estimate other parameters in the
    model.
  • E.g.
  • Importance for economistscontrolling v.

????
38
Variances in Misspecified Models
39
Whether or Not to Include x2 Two Favorable
Reasons
  • The choice of whether or not to include a
    particular variable in a regression model can be
    made by analyzing the tradeoff between bias and
    variance..
  • However, when2 0, there are two favorable
    reasons for including x2 in the model.
  • any bias in does not shrink as the sample
    size grows
  • The variance of estimators both shrink to zero as
    n increase
  • Therefor, the multicollinearity induced by adding
    x2 becomes less important as the sample size
    grows. In large samples, we would prefer

40
Estimating Standard Errors of the OLS
Estimators
????
41
EFFICIENCY OF OLS THE GAUSS-MARKOV THEOREM
  • BLUE
  • Best smallest variance
  • linear
  • unbiased

????(1)????????????????(2)??G-M????????,?BLUE???
???????????(???)????????????,???????????
42
Classical Linear Model AssumptionsInference
43
???????????
  • Jeffrey M. Wooldridge, Introductory
    EconometricsA Modern Approach, Chap 2-3.
Write a Comment
User Comments (0)
About PowerShow.com