Title: 2. Fixed Effects Models
12. Fixed Effects Models
- 2.1 Basic fixed-effects model
- 2.2 Exploring panel data
- 2.3 Estimation and inference
- 2.4 Model specification and diagnostics
- 2.5 Model extensions
- Appendix 2A - Least squares estimation
22.1 Basic fixed effects model
- Basic Elements
- Subject i is observed on Ti occasions
- i 1, ..., n,
- Ti ??T, the maximal number of time periods.
- The response of interest is yit.
- The K explanatory variables are xit xit1,
xit2, ..., xitK, a vector of dimension K ? 1. - The population parameters are ? (?1, ..., ?K),
a vector of dimension K ? 1.
3Observables Representation of the Linear Model
- E yit ? ?1 xit1 ? 2 xit2 ... ?K xitK.
- xit,1, ... , xit,K are nonstochastic variables.
- Var yit s 2.
- yit are independent random variables.
- yit are normally distributed.
- The observable variables are xit,1, ... , xit,K
, yit. - Think of xit,1, ... , xit,K as defining a
strata. - We take a random draw, yit , from each strata.
- Thus, we treat the xs as nonstochastic
- We are interested in the distribution of y,
conditional on the xs.
4Error Representation of the Linear Model
- yit ? ?1 xit,1 ?2 xit,2 ... ?K xit,K
eit - where E eit 0.
- xit,1, ... , xit,K are nonstochastic
variables.. - Var eit s 2.
- eit are independent random variables.
- This representation is based on the Gaussian
theory of errors it is centered on the
unobservable variable eit . - Here, eit are i.i.d., mean zero random variables.
5Heterogeneous model
- We now introduce a subscript on the intercept
term, to account for heterogeneity. - E yit ?i ?1 xit,1 ?2 xit,2 ... ?K xit,K.
- For short-hand, we write this as
- E yit ?i xit ?
6Analysis of covariance model
- The intercept parameter, ?, varies by subject.
- The population parameters ? do not but control
for the common effect of the covariates x. - Because the errors are mean zero, the expected
response is E yit ?i xit ?.
7Parameters of interest
- The common effects of the explanatory variables
are dictated by the sign and magnitude of the
betas (?s) - These are the parameters of interest
- The intercept parameters vary by subject and
account for different behavior of subjects. - The intercept parameters control for the
heterogeneity of subjects. - Because they are of secondary interest, the
intercepts are called nuisance parameters.
8Time-specific analysis of covariance
- The basic model also is a traditional analysis of
covariance model. - The basic fixed-effects model focuses on the mean
response and assumes - no serial correlation (correlation over time)
- no cross-sectional (contemporaneous) correlation
(correlation between subjects) - Hence, no special relationship between subjects
and time is assumed. - By interchanging i and t, we may consider the
model - yit ?t xit ? ?it .
- The parameters ?t are time-specific variables
that do not depend on subjects.
9Subject and time heterogeneity
- Typically, the number of subjects, n,
substantially exceeds the maximal number of time
periods, T. - Typically, the heterogeneity among subjects
explains a greater proportion of variability than
the heterogeneity among time periods. - Thus, we begin with the basic model yit ?i
xit ? ?it . - This model allows explicit parameterization of
the subject-specific heterogeneity. - By using binary variables for the time dimension,
we can easily incorporate time-specific
parameters.
102.2 Exploring panel data
- Why Explore?
- Many important features of the data can be
summarized numerically or graphically without
reference to a model - Data exploration provides hints of the
appropriate model - Many social science data sets are observational -
they do not arise as the result of a designed
experiment - The data collection mechanism does not dictate
the model selection process. - To draw reliable inferences from the modeling
procedure, it is important that the data be
congruent with the model. - Exploring the data also alerts us to any unusual
observations and/or subjects.
11Data exploration techniques
- Panel data is a special case of regression data.
- Techniques applicable to regression are also
useful for panel data. - Some commonly used techniques include
- Summarize the distribution of y and each x
- Graphically, through histograms and other density
estimators - Numerically, through basic summary statistics
(mean, median, standard deviation, minimum and
maximum) . - Summarize the relation between between y and each
x - Graphically, through scatter plots
- Numerically, through correlation statistics
- Summary statistics by time period may be useful
for detecting temporal patterns. - Three more specialized (for panel data)
techniques are - Multiple time series plots
- Scatterplots with symbols
- Added variable plots.
- Section 2.2 discusses additional techniques
these are performed after the fit of a
preliminary model.
12Multiple time series plots
- Plot of the response, yit, versus time t.
- Serially connect observations among common
subjects. - This graph helps detect
- Patterns over time
- Unusual observations and/or subjects.
- Visualize the heterogeneity.
13Scatterplots with symbols
- Plot of the response, yit, versus an explanatory
variable, xitj - Use a plotting symbol to encode the subject
number i - See the relationship between the response and
explanatory variable yet account for the varying
intercepts. - Variation If there is a separation in the xs,
such as increasing over time, - then we can serially connect the observations.
- We do not need a separate plotting symbol for
each subject.
14Basic added variable plot
- This is a plot of versus
. - Motivation Typically, the subject-specific
parameters account for a large portion of the
variability. - This plot allows us to visualize the relationship
between y and each x, without forcing our eye to
adjust for the heterogeneity of the
subject-specific intercepts.
15Trellis Plot
162.3 Estimation and inference
- Least squares estimates
- By the Gauss-Markov theorem, the best linear
unbiased estimates are the ordinary least square
(ols) estimates. - These are given by
- and
- Here, and are averages of yit and
xit over time. - Time-constant xs prevent one from getting unique
estimates of b !!!
17Estimation details
- Although there are nK unknown parameters, the
calculation of the ols estimates requires
inversion of only a K K matrix. - The ols estimate of b can also be expressed as a
weighted average of estimates of subject-specific
parameters. - Suppose that all parameters are subject-specific
so that the model is yit ai xit bi eit - The ols estimate of bi turns out to be
- Define the weighting matrix
- With this weight, we can express the ols
estimate of b as - a weighted average of subject-specific parameter
estimates.
18Properties of estimates
- Both ai and b have the usual properties of ols
regression estimators - They are unbiased estimators.
- By the Gauss-Markov theorem, they are minimum
variance among the class of unbiased estimates. - To see this, consider an expression of the ols
estimate of b, - That is, b is a linear combination of responses.
- If the responses are normally distributed, then
so is b. - The variance of b turns out to be
19ANOVA and standard errors
- This follows the usual regression set-up.
- We define the residuals as eit yit - (ai xit
b) . - The error sum of squares is Error SS Sit eit
2. - The mean square error is
- the residual standard deviation is s.
- The standard errors of the slope estimates are
from the square root of the diagonal of the
estimated variance matrix
20Consistency of estimates
- As the number of subjects (n) gets large, then b
approaches b. - Specifically, weak consistency means approaching
(convergence) in probability. - This is a direct result of the unbiasedness and
an assumption that Si Wi grows without bound. - As n gets large, the intercept estimates ai do
not approach ai. - They are inconsistent.
- Intuitively, this is because we assume that the
number of repeated measurements of ai is Ti , a
bounded number.
21Other large sample approximations
- Typically, the number of subjects is large
relative to the number of time periods observed. - Thus, in deriving large sample approximations of
the sampling distributions of estimators, assume
that n ?? although T remains fixed. - With this assumption, we have a central limit
theorem for the slope estimator. - That is, b is approximately normally distributed
even though though responses are not. - The approximation improves as n becomes large.
- Unlike the usual regression set-up, this is not
true for the intercepts. If the responses are not
normally distributed, then ai are not even
approximately normal.
222.4 Model specification and diagnostics
- Pooling Test
- Added variable plots
- Influence diagnostics
- Cross-sectional correlations
- Heteroscedasticity
23Pooling test
- Test whether the intercepts take on a common
value, say a. - Using notation, we wish to test the null
hypothesis - H0 a1 a2 ... an a.
- This can be done using the following partial F-
(Chow) test - Run the full model yit ?i xit ? ?it to
get Error SS and s2 . - Run the reduced model yit ? xit ? ?it to
get (Error SS)reduced . - Compute the partial F-statistic,
- Reject H0 if F exceeds a quantile from an
F-distribution with numerator degrees of freedom
df1 n-1 and denominator degrees of freedom df2
N-(nK).
24Added variable plot
- An added variable plot (also called a partial
regression plot) is a standard graphical device
used in regression analysis - Purpose To view the relationship between a
response and an explanatory variable, after
controlling for the linear effects of other
explanatory variables. - Added variable plots allow us to visualize the
relationship between y and each x, without
forcing our eye to adjust for the differences
induced by the other xs. - The basic added variable plot is a special case.
25Procedure for making an added variable plot
- Select an explanatory variable, say xj.
- Run a regression of y on the other explanatory
variables (omitting xj) - calculate the residuals from this regression.
Call these residuals e1. - Run a regression of xj on the other explanatory
variables (omitting xj) - calculate the residuals from this regression.
Call these residuals e2. - The plot of e1 versus e2 is an added variable
plot.
26Correlations and added variable plots
- Let corr(e1, e2 ) be the correlation between the
two sets of residuals. - It is related to the t-statistic of xj, t(bj ) ,
from the full regression equation (including xj)
through - Here, K is the number of regression coefficients
in the full regression equation and N is the
number of observations. - Thus, the t-statistic can be used to determine
the correlation coefficient of the added variable
plot without running the three step procedure. - However, unlike correlation coefficients, the
added variable plot allows us to visualize
potential nonlinear relationships between y and
xj .
27Influence diagnostics
- Influence diagnostics allow the analyst to
understand the impact of individual observations
and/or subjects on the estimated model - Traditional diagnostic statistics are
observation-level - of less interest in panel data analysis
- the effect of unusual observations is absorbed by
subject-specific parameters. - Of greater interest is the impact that an entire
subject has on the population parameters. - We use the statistic
- Here, b(i) is the ols estimate b calculated with
the ith subject omitted.
28Calibration of influence diagnostic
- The panel data influence diagnostic is similar to
Cooks distance for regression. - Cooks distance is calculated at the
observational level yet Bi(b) is at the subject
level - The statistic Bi(b) has an approximate c2
(chi-square) with K degrees of freedom - Observations with a large value of Bi(b) may be
influential on the parameter estimates. - Use quantiles of the c2 to quantify the adjective
large. - Influential observations warrant further
investigation - they may need correction, additional variable
specification to accommodate differences or
deletion from the data set.
29Cross-sectional correlations
- The basic model assumes independence between
subjects. - Looking at a cross-section of subjects, we assume
zero cross-sectional correlation, that is, rij
Corr (yit ,yjt) 0 for i ? j. - Suppose that the true model is yit lt xitb
eit ,where lt is a random temporal effect that
is common to all subjects. - This yields Var yit sl2 s 2
- The covariance between observations at the same
time but from different subjects is Cov (yit
,yjt) sl2 , i ? j. - Thus, the cross-sectional correlation is
30Testing for cross-sectional correlations
- To test H0 rij 0 for all i ? j, assume that Ti
T . - Calculate model residuals eit.
- For each subject i, calculate the ranks of each
residual. - That is, define ri,1 , ..., ri,T to be the
ranks of ei,1 , ..., ei,T . - Ranks will vary from 1 to T, so the average rank
is (T1)/2. - For the ith and jth subject, calculate the rank
correlation coefficient (Spearmans correlation) - Calculate the average Spearmans correlation and
the average squared Spearmans correlation - Here, Siltj means sum over i1, ..., j-1 and
j2, ..., n.
31Calibration of cross-sectional correlation test
- We compare R2ave to a distribution that is a
weighted sum of chi-square random variables
(Frees, 1995). - Specifically, define
- Q a(T) (c12- (T-1)) b(T) (c22- T(T-3)/2) .
- Here, c12 and c22 are independent chi-square
random variables with T-1 and T(T-3)/2 degrees of
freedom, respectively. - The constants are
- a(T) 4(T2) / (5(T-1)2(T1))
- and
- b(T) 2(5T6) / (5T(T-1)(T1)) .
32Calculation short-cuts
- Rule of thumb for cut-offs for the Q distributon
. - To calculate R2ave
- Define
- For each t, u, calculate Si Zi,t,u and Si Zi,t,u
2. . - We have
- Here, St,u means sum over t1, ..., T and u1,
..., T. - Although more complex in appearance, this is a
much faster computation form for R2ave. - Main drawback - the asymptotic distribution is
only available for balanced data.
33Heteroscedasticity
- Carroll and Ruppert (1988) provide a broad
treatment - Here is a test due to Breusch and Pagan (1980).
- Ha Var eit s 2 g wit, where wit is a known
vector of weighting variables and g is a
p-dimensional vector of parameters. - H0 Var eit s 2. This procedure is
- Fit a regression model and calculate the model
residuals, rit. - Calculate squared standardized residuals,
- Fit a regression model of on wit.
- The test statistic is LM (Regress SSw)/2, where
Regress SSw is the regression sum of squares from
the model fit in step 3. - Reject the null hypothesis if LM exceeds a
percentile from a chi-square distribution with p
degrees of freedom. The percentile is one minus
the significance level of the test.
342.5 Model extensions
- In panel data, subjects are measured repeatedly
over time. Panel data analysis is useful for
studying subject changes over time. - Repeated measurements of a subject tend to be
intercorrelated. - Up to this point, we have used time-varying
covariates to account for the presence of time in
the mean response. - However, as in time series analysis, it is also
useful to measure the tendencies in time patterns
through a correlation structure.
35Timing of observations
- We now specify the time periods when the
observations are made. - We assume that we have at most T observations on
each subject. - These observations are made at time periods t1,
t2, ..., tT. - Each subject has observations made at a subset of
these T time periods, labeled t1, t2, ..., tTi. - The subset may vary by subject and thus could be
denoted by t1i, t2i, ..., tTii. - For brevity, we use the simpler notation scheme
and drop the second i subscript. - This framework, although notationally complex,
allows for missing data and incomplete
observations.
36Temporal covariance matrix
- For a full set of observations, let R denote the
T ? T temporal (time) variance-covariance matrix. - This is defined by R Var (?i)
- Let Rrs Cov (?ir, ?is) is the element in the
rth row and sth column of R. - There are at most T(T1)/2 unknown elements of R.
- Denote this dependence of R on parameters using
R(?). Here, ? is the vector of unknown parameters
of R. - For the ith observation, we have Var (?i )
Ri(?), a Ti ? Ti matrix. - The matrix Ri(?) can be determined by removing
certain rows and columns of the matrix R(?). - We assume that Ri(?) is positive-definite and
only depends on i through its dimension.
37Special cases of R
- R ? 2 I, where I is a T ? T identity matrix.
This is the case of no serial correlation. - R ? 2 ( (1-?) I ? J ), where J is a T ? T
matrix of 1s. This is the uniform correlation
model (also called compound symmetry). - Consider the model yit ?i ?it ,where ?i is a
random cross-sectional effect. - This yields Rtt Var ?it ??? ???.
- For r ? s, consider Rrs Cov (yir ,yis) ??? .
- To write this in terms of ?2, note that
Corr (?it , ?is ) ??? / (??? ???) ?? - Thus, Rrs ? 2 ?.
38More special cases of R
- Rrs ? 2 exp( -? tr - ts ) .
- In the case of equally spaced in time
observations, we may assume that tr1 - tr 1.
Thus, Rrs ? 2 ? r-s , where ? exp (-? ) . - This is the autoregressive of order one model,
denoted by AR(1). - More generally, for equally spaced in time
observations, assume - Cov (?ir , ?is ) Cov (?ij , ?ik ) for r-s
j-k. - This is a stationary assumption.
- It implies homoscedasticity.
- There are only T unknown elements of R,
Toeplitz matrix. - Assume only homoscedasticity.
- There are 1 T(T-1)/2 unknown elements of R,
corresponding to the variance and the correlation
matrix. - Make no assumptions on R.
- There are T(T1)/2 unknown elements of R.
39(No Transcript)
40Subject-specific slopes
- Let one or more slopes vary by subject.
- The fixed effects linear panel data model is
- yit zit ?i xit ? ?it .
- The q explanatory variables are zit zit1,
zit2, ..., zitq, a vector of dimension q ? 1. - The subject-specific parameters are ai (ai1,
..., aiq), a vector of dimension q ? 1. - This is short-hand notation for the model
- yit ?i1 zit1 ... ?iq zitq ?1 xit1... ?K
xitK ?it . - The responses between subjects are independent
- We allow for temporal correlation through the
assumption that Var ?i Ri(?).
41Assumptions of the Fixed Effects Linear
Longitudinal Data Model
- E yi Zi ai Xi ß.
- xit,1, ... , xit,K and zit,1, ... , zit,q are
nonstochastic. - Var yi Ri(t) Ri.
- yi are independent random vectors.
- yit are normally distributed.
42Least Squares Estimates
- The estimates are derived in Appendix 2A.2.
- They are given by
- and
- Here,
43Robust estimation of standard errors
- It is common practice to ignore serial
correlation and heteroscedasticity, so that one
assumes Ri s 2 Ii . - Thus,
- where
- Huber (1967), White (1980) and Liang and Zeger
(1986) suggested replacing Ri by ei ei . Here,
ei is the vector of residuals. Thus, a robust
standard error of bj is