2. Fixed Effects Models - PowerPoint PPT Presentation

About This Presentation
Title:

2. Fixed Effects Models

Description:

2.1 Basic fixed-effects model 2.2 Exploring panel data 2.3 Estimation and inference 2.4 Model specification and diagnostics 2.5 Model extensions – PowerPoint PPT presentation

Number of Views:641
Avg rating:3.0/5.0
Slides: 44
Provided by: Jed51
Category:

less

Transcript and Presenter's Notes

Title: 2. Fixed Effects Models


1
2. Fixed Effects Models
  • 2.1 Basic fixed-effects model
  • 2.2 Exploring panel data
  • 2.3 Estimation and inference
  • 2.4 Model specification and diagnostics
  • 2.5 Model extensions
  • Appendix 2A - Least squares estimation

2
2.1 Basic fixed effects model
  • Basic Elements
  • Subject i is observed on Ti occasions
  • i 1, ..., n,
  • Ti ??T, the maximal number of time periods.
  • The response of interest is yit.
  • The K explanatory variables are xit xit1,
    xit2, ..., xitK, a vector of dimension K ? 1.
  • The population parameters are ? (?1, ..., ?K),
    a vector of dimension K ? 1.

3
Observables Representation of the Linear Model
  • E yit ? ?1 xit1 ? 2 xit2 ... ?K xitK.
  • xit,1, ... , xit,K are nonstochastic variables.
  • Var yit s 2.
  • yit are independent random variables.
  • yit are normally distributed.
  • The observable variables are xit,1, ... , xit,K
    , yit.
  • Think of xit,1, ... , xit,K as defining a
    strata.
  • We take a random draw, yit , from each strata.
  • Thus, we treat the xs as nonstochastic
  • We are interested in the distribution of y,
    conditional on the xs.

4
Error Representation of the Linear Model
  • yit ? ?1 xit,1 ?2 xit,2 ... ?K xit,K
    eit
  • where E eit 0.
  • xit,1, ... , xit,K are nonstochastic
    variables..
  • Var eit s 2.
  • eit are independent random variables.
  • This representation is based on the Gaussian
    theory of errors it is centered on the
    unobservable variable eit .
  • Here, eit are i.i.d., mean zero random variables.

5
Heterogeneous model
  • We now introduce a subscript on the intercept
    term, to account for heterogeneity.
  • E yit ?i ?1 xit,1 ?2 xit,2 ... ?K xit,K.
  • For short-hand, we write this as
  • E yit ?i xit ?

6
Analysis of covariance model
  • The intercept parameter, ?, varies by subject.
  • The population parameters ? do not but control
    for the common effect of the covariates x.
  • Because the errors are mean zero, the expected
    response is E yit ?i xit ?.

7
Parameters of interest
  • The common effects of the explanatory variables
    are dictated by the sign and magnitude of the
    betas (?s)
  • These are the parameters of interest
  • The intercept parameters vary by subject and
    account for different behavior of subjects.
  • The intercept parameters control for the
    heterogeneity of subjects.
  • Because they are of secondary interest, the
    intercepts are called nuisance parameters.

8
Time-specific analysis of covariance
  • The basic model also is a traditional analysis of
    covariance model.
  • The basic fixed-effects model focuses on the mean
    response and assumes
  • no serial correlation (correlation over time)
  • no cross-sectional (contemporaneous) correlation
    (correlation between subjects)
  • Hence, no special relationship between subjects
    and time is assumed.
  • By interchanging i and t, we may consider the
    model
  • yit ?t xit ? ?it .
  • The parameters ?t are time-specific variables
    that do not depend on subjects.

9
Subject and time heterogeneity
  • Typically, the number of subjects, n,
    substantially exceeds the maximal number of time
    periods, T.
  • Typically, the heterogeneity among subjects
    explains a greater proportion of variability than
    the heterogeneity among time periods.
  • Thus, we begin with the basic model yit ?i
    xit ? ?it .
  • This model allows explicit parameterization of
    the subject-specific heterogeneity.
  • By using binary variables for the time dimension,
    we can easily incorporate time-specific
    parameters.

10
2.2 Exploring panel data
  • Why Explore?
  • Many important features of the data can be
    summarized numerically or graphically without
    reference to a model
  • Data exploration provides hints of the
    appropriate model
  • Many social science data sets are observational -
    they do not arise as the result of a designed
    experiment
  • The data collection mechanism does not dictate
    the model selection process.
  • To draw reliable inferences from the modeling
    procedure, it is important that the data be
    congruent with the model.
  • Exploring the data also alerts us to any unusual
    observations and/or subjects.

11
Data exploration techniques
  • Panel data is a special case of regression data.
  • Techniques applicable to regression are also
    useful for panel data.
  • Some commonly used techniques include
  • Summarize the distribution of y and each x
  • Graphically, through histograms and other density
    estimators
  • Numerically, through basic summary statistics
    (mean, median, standard deviation, minimum and
    maximum) .
  • Summarize the relation between between y and each
    x
  • Graphically, through scatter plots
  • Numerically, through correlation statistics
  • Summary statistics by time period may be useful
    for detecting temporal patterns.
  • Three more specialized (for panel data)
    techniques are
  • Multiple time series plots
  • Scatterplots with symbols
  • Added variable plots.
  • Section 2.2 discusses additional techniques
    these are performed after the fit of a
    preliminary model.

12
Multiple time series plots
  • Plot of the response, yit, versus time t.
  • Serially connect observations among common
    subjects.
  • This graph helps detect
  • Patterns over time
  • Unusual observations and/or subjects.
  • Visualize the heterogeneity.

13
Scatterplots with symbols
  • Plot of the response, yit, versus an explanatory
    variable, xitj
  • Use a plotting symbol to encode the subject
    number i
  • See the relationship between the response and
    explanatory variable yet account for the varying
    intercepts.
  • Variation If there is a separation in the xs,
    such as increasing over time,
  • then we can serially connect the observations.
  • We do not need a separate plotting symbol for
    each subject.

14
Basic added variable plot
  • This is a plot of versus
    .
  • Motivation Typically, the subject-specific
    parameters account for a large portion of the
    variability.
  • This plot allows us to visualize the relationship
    between y and each x, without forcing our eye to
    adjust for the heterogeneity of the
    subject-specific intercepts.

15
Trellis Plot
16
2.3 Estimation and inference
  • Least squares estimates
  • By the Gauss-Markov theorem, the best linear
    unbiased estimates are the ordinary least square
    (ols) estimates.
  • These are given by
  • and
  • Here, and are averages of yit and
    xit over time.
  • Time-constant xs prevent one from getting unique
    estimates of b !!!

17
Estimation details
  • Although there are nK unknown parameters, the
    calculation of the ols estimates requires
    inversion of only a K K matrix.
  • The ols estimate of b can also be expressed as a
    weighted average of estimates of subject-specific
    parameters.
  • Suppose that all parameters are subject-specific
    so that the model is yit ai xit bi eit
  • The ols estimate of bi turns out to be
  • Define the weighting matrix
  • With this weight, we can express the ols
    estimate of b as
  • a weighted average of subject-specific parameter
    estimates.

18
Properties of estimates
  • Both ai and b have the usual properties of ols
    regression estimators
  • They are unbiased estimators.
  • By the Gauss-Markov theorem, they are minimum
    variance among the class of unbiased estimates.
  • To see this, consider an expression of the ols
    estimate of b,
  • That is, b is a linear combination of responses.
  • If the responses are normally distributed, then
    so is b.
  • The variance of b turns out to be

19
ANOVA and standard errors
  • This follows the usual regression set-up.
  • We define the residuals as eit yit - (ai xit
    b) .
  • The error sum of squares is Error SS Sit eit
    2.
  • The mean square error is
  • the residual standard deviation is s.
  • The standard errors of the slope estimates are
    from the square root of the diagonal of the
    estimated variance matrix

20
Consistency of estimates
  • As the number of subjects (n) gets large, then b
    approaches b.
  • Specifically, weak consistency means approaching
    (convergence) in probability.
  • This is a direct result of the unbiasedness and
    an assumption that Si Wi grows without bound.
  • As n gets large, the intercept estimates ai do
    not approach ai.
  • They are inconsistent.
  • Intuitively, this is because we assume that the
    number of repeated measurements of ai is Ti , a
    bounded number.

21
Other large sample approximations
  • Typically, the number of subjects is large
    relative to the number of time periods observed.
  • Thus, in deriving large sample approximations of
    the sampling distributions of estimators, assume
    that n ?? although T remains fixed.
  • With this assumption, we have a central limit
    theorem for the slope estimator.
  • That is, b is approximately normally distributed
    even though though responses are not.
  • The approximation improves as n becomes large.
  • Unlike the usual regression set-up, this is not
    true for the intercepts. If the responses are not
    normally distributed, then ai are not even
    approximately normal.

22
2.4 Model specification and diagnostics
  • Pooling Test
  • Added variable plots
  • Influence diagnostics
  • Cross-sectional correlations
  • Heteroscedasticity

23
Pooling test
  • Test whether the intercepts take on a common
    value, say a.
  • Using notation, we wish to test the null
    hypothesis
  • H0 a1 a2 ... an a.
  • This can be done using the following partial F-
    (Chow) test
  • Run the full model yit ?i xit ? ?it to
    get Error SS and s2 .
  • Run the reduced model yit ? xit ? ?it to
    get (Error SS)reduced .
  • Compute the partial F-statistic,
  • Reject H0 if F exceeds a quantile from an
    F-distribution with numerator degrees of freedom
    df1 n-1 and denominator degrees of freedom df2
    N-(nK).

24
Added variable plot
  • An added variable plot (also called a partial
    regression plot) is a standard graphical device
    used in regression analysis
  • Purpose To view the relationship between a
    response and an explanatory variable, after
    controlling for the linear effects of other
    explanatory variables.
  • Added variable plots allow us to visualize the
    relationship between y and each x, without
    forcing our eye to adjust for the differences
    induced by the other xs.
  • The basic added variable plot is a special case.

25
Procedure for making an added variable plot
  • Select an explanatory variable, say xj.
  • Run a regression of y on the other explanatory
    variables (omitting xj)
  • calculate the residuals from this regression.
    Call these residuals e1.
  • Run a regression of xj on the other explanatory
    variables (omitting xj)
  • calculate the residuals from this regression.
    Call these residuals e2.
  • The plot of e1 versus e2 is an added variable
    plot.

26
Correlations and added variable plots
  • Let corr(e1, e2 ) be the correlation between the
    two sets of residuals.
  • It is related to the t-statistic of xj, t(bj ) ,
    from the full regression equation (including xj)
    through
  • Here, K is the number of regression coefficients
    in the full regression equation and N is the
    number of observations.
  • Thus, the t-statistic can be used to determine
    the correlation coefficient of the added variable
    plot without running the three step procedure.
  • However, unlike correlation coefficients, the
    added variable plot allows us to visualize
    potential nonlinear relationships between y and
    xj .

27
Influence diagnostics
  • Influence diagnostics allow the analyst to
    understand the impact of individual observations
    and/or subjects on the estimated model
  • Traditional diagnostic statistics are
    observation-level
  • of less interest in panel data analysis
  • the effect of unusual observations is absorbed by
    subject-specific parameters.
  • Of greater interest is the impact that an entire
    subject has on the population parameters.
  • We use the statistic
  • Here, b(i) is the ols estimate b calculated with
    the ith subject omitted.

28
Calibration of influence diagnostic
  • The panel data influence diagnostic is similar to
    Cooks distance for regression.
  • Cooks distance is calculated at the
    observational level yet Bi(b) is at the subject
    level
  • The statistic Bi(b) has an approximate c2
    (chi-square) with K degrees of freedom
  • Observations with a large value of Bi(b) may be
    influential on the parameter estimates.
  • Use quantiles of the c2 to quantify the adjective
    large.
  • Influential observations warrant further
    investigation
  • they may need correction, additional variable
    specification to accommodate differences or
    deletion from the data set.

29
Cross-sectional correlations
  • The basic model assumes independence between
    subjects.
  • Looking at a cross-section of subjects, we assume
    zero cross-sectional correlation, that is, rij
    Corr (yit ,yjt) 0 for i ? j.
  • Suppose that the true model is yit lt xitb
    eit ,where lt is a random temporal effect that
    is common to all subjects.
  • This yields Var yit sl2 s 2
  • The covariance between observations at the same
    time but from different subjects is Cov (yit
    ,yjt) sl2 , i ? j.
  • Thus, the cross-sectional correlation is

30
Testing for cross-sectional correlations
  • To test H0 rij 0 for all i ? j, assume that Ti
    T .
  • Calculate model residuals eit.
  • For each subject i, calculate the ranks of each
    residual.
  • That is, define ri,1 , ..., ri,T to be the
    ranks of ei,1 , ..., ei,T .
  • Ranks will vary from 1 to T, so the average rank
    is (T1)/2.
  • For the ith and jth subject, calculate the rank
    correlation coefficient (Spearmans correlation)
  • Calculate the average Spearmans correlation and
    the average squared Spearmans correlation
  • Here, Siltj means sum over i1, ..., j-1 and
    j2, ..., n.

31
Calibration of cross-sectional correlation test
  • We compare R2ave to a distribution that is a
    weighted sum of chi-square random variables
    (Frees, 1995).
  • Specifically, define
  • Q a(T) (c12- (T-1)) b(T) (c22- T(T-3)/2) .
  • Here, c12 and c22 are independent chi-square
    random variables with T-1 and T(T-3)/2 degrees of
    freedom, respectively.
  • The constants are
  • a(T) 4(T2) / (5(T-1)2(T1))
  • and
  • b(T) 2(5T6) / (5T(T-1)(T1)) .

32
Calculation short-cuts
  • Rule of thumb for cut-offs for the Q distributon
    .
  • To calculate R2ave
  • Define
  • For each t, u, calculate Si Zi,t,u and Si Zi,t,u
    2. .
  • We have
  • Here, St,u means sum over t1, ..., T and u1,
    ..., T.
  • Although more complex in appearance, this is a
    much faster computation form for R2ave.
  • Main drawback - the asymptotic distribution is
    only available for balanced data.

33
Heteroscedasticity
  • Carroll and Ruppert (1988) provide a broad
    treatment
  • Here is a test due to Breusch and Pagan (1980).
  • Ha Var eit s 2 g wit, where wit is a known
    vector of weighting variables and g is a
    p-dimensional vector of parameters.
  • H0 Var eit s 2. This procedure is
  • Fit a regression model and calculate the model
    residuals, rit.
  • Calculate squared standardized residuals,
  • Fit a regression model of on wit.
  • The test statistic is LM (Regress SSw)/2, where
    Regress SSw is the regression sum of squares from
    the model fit in step 3.
  • Reject the null hypothesis if LM exceeds a
    percentile from a chi-square distribution with p
    degrees of freedom. The percentile is one minus
    the significance level of the test.

34
2.5 Model extensions
  • In panel data, subjects are measured repeatedly
    over time. Panel data analysis is useful for
    studying subject changes over time.
  • Repeated measurements of a subject tend to be
    intercorrelated.
  • Up to this point, we have used time-varying
    covariates to account for the presence of time in
    the mean response.
  • However, as in time series analysis, it is also
    useful to measure the tendencies in time patterns
    through a correlation structure.

35
Timing of observations
  • We now specify the time periods when the
    observations are made.
  • We assume that we have at most T observations on
    each subject.
  • These observations are made at time periods t1,
    t2, ..., tT.
  • Each subject has observations made at a subset of
    these T time periods, labeled t1, t2, ..., tTi.
  • The subset may vary by subject and thus could be
    denoted by t1i, t2i, ..., tTii.
  • For brevity, we use the simpler notation scheme
    and drop the second i subscript.
  • This framework, although notationally complex,
    allows for missing data and incomplete
    observations.

36
Temporal covariance matrix
  • For a full set of observations, let R denote the
    T ? T temporal (time) variance-covariance matrix.
  • This is defined by R Var (?i)
  • Let Rrs Cov (?ir, ?is) is the element in the
    rth row and sth column of R.
  • There are at most T(T1)/2 unknown elements of R.
  • Denote this dependence of R on parameters using
    R(?). Here, ? is the vector of unknown parameters
    of R.
  • For the ith observation, we have Var (?i )
    Ri(?), a Ti ? Ti matrix.
  • The matrix Ri(?) can be determined by removing
    certain rows and columns of the matrix R(?).
  • We assume that Ri(?) is positive-definite and
    only depends on i through its dimension.

37
Special cases of R
  • R ? 2 I, where I is a T ? T identity matrix.
    This is the case of no serial correlation.
  • R ? 2 ( (1-?) I ? J ), where J is a T ? T
    matrix of 1s. This is the uniform correlation
    model (also called compound symmetry).
  • Consider the model yit ?i ?it ,where ?i is a
    random cross-sectional effect.
  • This yields Rtt Var ?it ??? ???.
  • For r ? s, consider Rrs Cov (yir ,yis) ??? .
  • To write this in terms of ?2, note that
    Corr (?it , ?is ) ??? / (??? ???) ??
  • Thus, Rrs ? 2 ?.

38
More special cases of R
  • Rrs ? 2 exp( -? tr - ts ) .
  • In the case of equally spaced in time
    observations, we may assume that tr1 - tr 1.
    Thus, Rrs ? 2 ? r-s , where ? exp (-? ) .
  • This is the autoregressive of order one model,
    denoted by AR(1).
  • More generally, for equally spaced in time
    observations, assume
  • Cov (?ir , ?is ) Cov (?ij , ?ik ) for r-s
    j-k.
  • This is a stationary assumption.
  • It implies homoscedasticity.
  • There are only T unknown elements of R,
    Toeplitz matrix.
  • Assume only homoscedasticity.
  • There are 1 T(T-1)/2 unknown elements of R,
    corresponding to the variance and the correlation
    matrix.
  • Make no assumptions on R.
  • There are T(T1)/2 unknown elements of R.

39
(No Transcript)
40
Subject-specific slopes
  • Let one or more slopes vary by subject.
  • The fixed effects linear panel data model is
  • yit zit ?i xit ? ?it .
  • The q explanatory variables are zit zit1,
    zit2, ..., zitq, a vector of dimension q ? 1.
  • The subject-specific parameters are ai (ai1,
    ..., aiq), a vector of dimension q ? 1.
  • This is short-hand notation for the model
  • yit ?i1 zit1 ... ?iq zitq ?1 xit1... ?K
    xitK ?it .
  • The responses between subjects are independent
  • We allow for temporal correlation through the
    assumption that Var ?i Ri(?).

41
Assumptions of the Fixed Effects Linear
Longitudinal Data Model
  • E yi Zi ai Xi ß.
  • xit,1, ... , xit,K and zit,1, ... , zit,q are
    nonstochastic.
  • Var yi Ri(t) Ri.
  • yi are independent random vectors.
  • yit are normally distributed.

42
Least Squares Estimates
  • The estimates are derived in Appendix 2A.2.
  • They are given by
  • and
  • Here,

43
Robust estimation of standard errors
  • It is common practice to ignore serial
    correlation and heteroscedasticity, so that one
    assumes Ri s 2 Ii .
  • Thus,
  • where
  • Huber (1967), White (1980) and Liang and Zeger
    (1986) suggested replacing Ri by ei ei . Here,
    ei is the vector of residuals. Thus, a robust
    standard error of bj is
Write a Comment
User Comments (0)
About PowerShow.com