2. Fixed Effects Models

About This Presentation

Title:

2. Fixed Effects Models

Description:

2.1 Basic fixed-effects model 2.2 Exploring panel data 2.3 Estimation and inference 2.4 Model specification and diagnostics 2.5 Model extensions – PowerPoint PPT presentation

Number of Views:629

Avg rating:3.0/5.0

Slides: 44

Provided by: Jed51

Learn more at: https://instruction.bus.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: 2. Fixed Effects Models

1
2. Fixed Effects Models

2.1 Basic fixed-effects model
2.2 Exploring panel data
2.3 Estimation and inference
2.4 Model specification and diagnostics
2.5 Model extensions
Appendix 2A - Least squares estimation

2
2.1 Basic fixed effects model

Basic Elements
Subject i is observed on Ti occasions
i 1, ..., n,
Ti ??T, the maximal number of time periods.
The response of interest is yit.
The K explanatory variables are xit xit1,
xit2, ..., xitK, a vector of dimension K ? 1.
The population parameters are ? (?1, ..., ?K),
a vector of dimension K ? 1.

3
Observables Representation of the Linear Model

E yit ? ?1 xit1 ? 2 xit2 ... ?K xitK.
xit,1, ... , xit,K are nonstochastic variables.
Var yit s 2.
yit are independent random variables.
yit are normally distributed.
The observable variables are xit,1, ... , xit,K
, yit.
Think of xit,1, ... , xit,K as defining a
strata.
We take a random draw, yit , from each strata.
Thus, we treat the xs as nonstochastic
We are interested in the distribution of y,
conditional on the xs.

4
Error Representation of the Linear Model

yit ? ?1 xit,1 ?2 xit,2 ... ?K xit,K
eit
where E eit 0.
xit,1, ... , xit,K are nonstochastic
variables..
Var eit s 2.
eit are independent random variables.
This representation is based on the Gaussian
theory of errors it is centered on the
unobservable variable eit .
Here, eit are i.i.d., mean zero random variables.

5
Heterogeneous model

We now introduce a subscript on the intercept
term, to account for heterogeneity.
E yit ?i ?1 xit,1 ?2 xit,2 ... ?K xit,K.
For short-hand, we write this as
E yit ?i xit ?

6
Analysis of covariance model

The intercept parameter, ?, varies by subject.
The population parameters ? do not but control
for the common effect of the covariates x.
Because the errors are mean zero, the expected
response is E yit ?i xit ?.

7
Parameters of interest

The common effects of the explanatory variables
are dictated by the sign and magnitude of the
betas (?s)
These are the parameters of interest
The intercept parameters vary by subject and
account for different behavior of subjects.
The intercept parameters control for the
heterogeneity of subjects.
Because they are of secondary interest, the
intercepts are called nuisance parameters.

8
Time-specific analysis of covariance

The basic model also is a traditional analysis of
covariance model.
The basic fixed-effects model focuses on the mean
response and assumes
no serial correlation (correlation over time)
no cross-sectional (contemporaneous) correlation
(correlation between subjects)
Hence, no special relationship between subjects
and time is assumed.
By interchanging i and t, we may consider the
model
yit ?t xit ? ?it .
The parameters ?t are time-specific variables
that do not depend on subjects.

9
Subject and time heterogeneity

Typically, the number of subjects, n,
substantially exceeds the maximal number of time
periods, T.
Typically, the heterogeneity among subjects
explains a greater proportion of variability than
the heterogeneity among time periods.
Thus, we begin with the basic model yit ?i
xit ? ?it .
This model allows explicit parameterization of
the subject-specific heterogeneity.
By using binary variables for the time dimension,
we can easily incorporate time-specific
parameters.

10
2.2 Exploring panel data

Why Explore?
Many important features of the data can be
summarized numerically or graphically without
reference to a model
Data exploration provides hints of the
appropriate model
Many social science data sets are observational -
they do not arise as the result of a designed
experiment
The data collection mechanism does not dictate
the model selection process.
To draw reliable inferences from the modeling
procedure, it is important that the data be
congruent with the model.
Exploring the data also alerts us to any unusual
observations and/or subjects.

11
Data exploration techniques

Panel data is a special case of regression data.
Techniques applicable to regression are also
useful for panel data.
Some commonly used techniques include
Summarize the distribution of y and each x
Graphically, through histograms and other density
estimators
Numerically, through basic summary statistics
(mean, median, standard deviation, minimum and
maximum) .
Summarize the relation between between y and each
x
Graphically, through scatter plots
Numerically, through correlation statistics
Summary statistics by time period may be useful
for detecting temporal patterns.
Three more specialized (for panel data)
techniques are
Multiple time series plots
Scatterplots with symbols
Added variable plots.
Section 2.2 discusses additional techniques
these are performed after the fit of a
preliminary model.

12
Multiple time series plots

Plot of the response, yit, versus time t.
Serially connect observations among common
subjects.
This graph helps detect
Patterns over time
Unusual observations and/or subjects.
Visualize the heterogeneity.

13
Scatterplots with symbols

Plot of the response, yit, versus an explanatory
variable, xitj
Use a plotting symbol to encode the subject
number i
See the relationship between the response and
explanatory variable yet account for the varying
intercepts.
Variation If there is a separation in the xs,
such as increasing over time,
then we can serially connect the observations.
We do not need a separate plotting symbol for
each subject.

14
Basic added variable plot

This is a plot of versus
.
Motivation Typically, the subject-specific
parameters account for a large portion of the
variability.
This plot allows us to visualize the relationship
between y and each x, without forcing our eye to
adjust for the heterogeneity of the
subject-specific intercepts.

15
Trellis Plot
16
2.3 Estimation and inference

Least squares estimates
By the Gauss-Markov theorem, the best linear
unbiased estimates are the ordinary least square
(ols) estimates.
These are given by
and
Here, and are averages of yit and
xit over time.
Time-constant xs prevent one from getting unique
estimates of b !!!

17
Estimation details

Although there are nK unknown parameters, the
calculation of the ols estimates requires
inversion of only a K K matrix.
The ols estimate of b can also be expressed as a
weighted average of estimates of subject-specific
parameters.
Suppose that all parameters are subject-specific
so that the model is yit ai xit bi eit
The ols estimate of bi turns out to be
Define the weighting matrix
With this weight, we can express the ols
estimate of b as
a weighted average of subject-specific parameter
estimates.

18
Properties of estimates

Both ai and b have the usual properties of ols
regression estimators
They are unbiased estimators.
By the Gauss-Markov theorem, they are minimum
variance among the class of unbiased estimates.
To see this, consider an expression of the ols
estimate of b,
That is, b is a linear combination of responses.
If the responses are normally distributed, then
so is b.
The variance of b turns out to be

19
ANOVA and standard errors

This follows the usual regression set-up.
We define the residuals as eit yit - (ai xit
b) .
The error sum of squares is Error SS Sit eit
2.
The mean square error is
the residual standard deviation is s.
The standard errors of the slope estimates are
from the square root of the diagonal of the
estimated variance matrix

20
Consistency of estimates

As the number of subjects (n) gets large, then b
approaches b.
Specifically, weak consistency means approaching
(convergence) in probability.
This is a direct result of the unbiasedness and
an assumption that Si Wi grows without bound.
As n gets large, the intercept estimates ai do
not approach ai.
They are inconsistent.
Intuitively, this is because we assume that the
number of repeated measurements of ai is Ti , a
bounded number.

21
Other large sample approximations

Typically, the number of subjects is large
relative to the number of time periods observed.
Thus, in deriving large sample approximations of
the sampling distributions of estimators, assume
that n ?? although T remains fixed.
With this assumption, we have a central limit
theorem for the slope estimator.
That is, b is approximately normally distributed
even though though responses are not.
The approximation improves as n becomes large.
Unlike the usual regression set-up, this is not
true for the intercepts. If the responses are not
normally distributed, then ai are not even
approximately normal.

22
2.4 Model specification and diagnostics

Pooling Test
Added variable plots
Influence diagnostics
Cross-sectional correlations
Heteroscedasticity

23
Pooling test

Test whether the intercepts take on a common
value, say a.
Using notation, we wish to test the null
hypothesis
H0 a1 a2 ... an a.
This can be done using the following partial F-
(Chow) test
Run the full model yit ?i xit ? ?it to
get Error SS and s2 .
Run the reduced model yit ? xit ? ?it to
get (Error SS)reduced .
Compute the partial F-statistic,
Reject H0 if F exceeds a quantile from an
F-distribution with numerator degrees of freedom
df1 n-1 and denominator degrees of freedom df2
N-(nK).

24
Added variable plot

An added variable plot (also called a partial
regression plot) is a standard graphical device
used in regression analysis
Purpose To view the relationship between a
response and an explanatory variable, after
controlling for the linear effects of other
explanatory variables.
Added variable plots allow us to visualize the
relationship between y and each x, without
forcing our eye to adjust for the differences
induced by the other xs.
The basic added variable plot is a special case.

25
Procedure for making an added variable plot

Select an explanatory variable, say xj.
Run a regression of y on the other explanatory
variables (omitting xj)
calculate the residuals from this regression.
Call these residuals e1.
Run a regression of xj on the other explanatory
variables (omitting xj)
calculate the residuals from this regression.
Call these residuals e2.
The plot of e1 versus e2 is an added variable
plot.

26
Correlations and added variable plots

Let corr(e1, e2 ) be the correlation between the
two sets of residuals.
It is related to the t-statistic of xj, t(bj ) ,
from the full regression equation (including xj)
through
Here, K is the number of regression coefficients
in the full regression equation and N is the
number of observations.
Thus, the t-statistic can be used to determine
the correlation coefficient of the added variable
plot without running the three step procedure.
However, unlike correlation coefficients, the
added variable plot allows us to visualize
potential nonlinear relationships between y and
xj .

27
Influence diagnostics

Influence diagnostics allow the analyst to
understand the impact of individual observations
and/or subjects on the estimated model
Traditional diagnostic statistics are
observation-level
of less interest in panel data analysis
the effect of unusual observations is absorbed by
subject-specific parameters.
Of greater interest is the impact that an entire
subject has on the population parameters.
We use the statistic
Here, b(i) is the ols estimate b calculated with
the ith subject omitted.

28
Calibration of influence diagnostic

The panel data influence diagnostic is similar to
Cooks distance for regression.
Cooks distance is calculated at the
observational level yet Bi(b) is at the subject
level
The statistic Bi(b) has an approximate c2
(chi-square) with K degrees of freedom
Observations with a large value of Bi(b) may be
influential on the parameter estimates.
Use quantiles of the c2 to quantify the adjective
large.
Influential observations warrant further
investigation
they may need correction, additional variable
specification to accommodate differences or
deletion from the data set.

29
Cross-sectional correlations

The basic model assumes independence between
subjects.
Looking at a cross-section of subjects, we assume
zero cross-sectional correlation, that is, rij
Corr (yit ,yjt) 0 for i ? j.
Suppose that the true model is yit lt xitb
eit ,where lt is a random temporal effect that
is common to all subjects.
This yields Var yit sl2 s 2
The covariance between observations at the same
time but from different subjects is Cov (yit
,yjt) sl2 , i ? j.
Thus, the cross-sectional correlation is

30
Testing for cross-sectional correlations

To test H0 rij 0 for all i ? j, assume that Ti
T .
Calculate model residuals eit.
For each subject i, calculate the ranks of each
residual.
That is, define ri,1 , ..., ri,T to be the
ranks of ei,1 , ..., ei,T .
Ranks will vary from 1 to T, so the average rank
is (T1)/2.
For the ith and jth subject, calculate the rank
correlation coefficient (Spearmans correlation)
Calculate the average Spearmans correlation and
the average squared Spearmans correlation
Here, Siltj means sum over i1, ..., j-1 and
j2, ..., n.

31
Calibration of cross-sectional correlation test

We compare R2ave to a distribution that is a
weighted sum of chi-square random variables
(Frees, 1995).
Specifically, define
Q a(T) (c12- (T-1)) b(T) (c22- T(T-3)/2) .
Here, c12 and c22 are independent chi-square
random variables with T-1 and T(T-3)/2 degrees of
freedom, respectively.
The constants are
a(T) 4(T2) / (5(T-1)2(T1))
and
b(T) 2(5T6) / (5T(T-1)(T1)) .

32
Calculation short-cuts

Rule of thumb for cut-offs for the Q distributon
.
To calculate R2ave
Define
For each t, u, calculate Si Zi,t,u and Si Zi,t,u
2. .
We have
Here, St,u means sum over t1, ..., T and u1,
..., T.
Although more complex in appearance, this is a
much faster computation form for R2ave.
Main drawback - the asymptotic distribution is
only available for balanced data.

33
Heteroscedasticity

Carroll and Ruppert (1988) provide a broad
treatment
Here is a test due to Breusch and Pagan (1980).
Ha Var eit s 2 g wit, where wit is a known
vector of weighting variables and g is a
p-dimensional vector of parameters.
H0 Var eit s 2. This procedure is
Fit a regression model and calculate the model
residuals, rit.
Calculate squared standardized residuals,
Fit a regression model of on wit.
The test statistic is LM (Regress SSw)/2, where
Regress SSw is the regression sum of squares from
the model fit in step 3.
Reject the null hypothesis if LM exceeds a
percentile from a chi-square distribution with p
degrees of freedom. The percentile is one minus
the significance level of the test.

34
2.5 Model extensions

In panel data, subjects are measured repeatedly
over time. Panel data analysis is useful for
studying subject changes over time.
Repeated measurements of a subject tend to be
intercorrelated.
Up to this point, we have used time-varying
covariates to account for the presence of time in
the mean response.
However, as in time series analysis, it is also
useful to measure the tendencies in time patterns
through a correlation structure.

35
Timing of observations

We now specify the time periods when the
observations are made.
We assume that we have at most T observations on
each subject.
These observations are made at time periods t1,
t2, ..., tT.
Each subject has observations made at a subset of
these T time periods, labeled t1, t2, ..., tTi.
The subset may vary by subject and thus could be
denoted by t1i, t2i, ..., tTii.
For brevity, we use the simpler notation scheme
and drop the second i subscript.
This framework, although notationally complex,
allows for missing data and incomplete
observations.

36
Temporal covariance matrix

For a full set of observations, let R denote the
T ? T temporal (time) variance-covariance matrix.
This is defined by R Var (?i)
Let Rrs Cov (?ir, ?is) is the element in the
rth row and sth column of R.
There are at most T(T1)/2 unknown elements of R.
Denote this dependence of R on parameters using
R(?). Here, ? is the vector of unknown parameters
of R.
For the ith observation, we have Var (?i )
Ri(?), a Ti ? Ti matrix.
The matrix Ri(?) can be determined by removing
certain rows and columns of the matrix R(?).
We assume that Ri(?) is positive-definite and
only depends on i through its dimension.

37
Special cases of R

R ? 2 I, where I is a T ? T identity matrix.
This is the case of no serial correlation.
R ? 2 ( (1-?) I ? J ), where J is a T ? T
matrix of 1s. This is the uniform correlation
model (also called compound symmetry).
Consider the model yit ?i ?it ,where ?i is a
random cross-sectional effect.
This yields Rtt Var ?it ??? ???.
For r ? s, consider Rrs Cov (yir ,yis) ??? .
To write this in terms of ?2, note that
Corr (?it , ?is ) ??? / (??? ???) ??
Thus, Rrs ? 2 ?.

38
More special cases of R

Rrs ? 2 exp( -? tr - ts ) .
In the case of equally spaced in time
observations, we may assume that tr1 - tr 1.
Thus, Rrs ? 2 ? r-s , where ? exp (-? ) .
This is the autoregressive of order one model,
denoted by AR(1).
More generally, for equally spaced in time
observations, assume
Cov (?ir , ?is ) Cov (?ij , ?ik ) for r-s
j-k.
This is a stationary assumption.
It implies homoscedasticity.
There are only T unknown elements of R,
Toeplitz matrix.
Assume only homoscedasticity.
There are 1 T(T-1)/2 unknown elements of R,
corresponding to the variance and the correlation
matrix.
Make no assumptions on R.
There are T(T1)/2 unknown elements of R.

39
(No Transcript)
40
Subject-specific slopes

Let one or more slopes vary by subject.
The fixed effects linear panel data model is
yit zit ?i xit ? ?it .
The q explanatory variables are zit zit1,
zit2, ..., zitq, a vector of dimension q ? 1.
The subject-specific parameters are ai (ai1,
..., aiq), a vector of dimension q ? 1.
This is short-hand notation for the model
yit ?i1 zit1 ... ?iq zitq ?1 xit1... ?K
xitK ?it .
The responses between subjects are independent
We allow for temporal correlation through the
assumption that Var ?i Ri(?).

41
Assumptions of the Fixed Effects Linear
Longitudinal Data Model

E yi Zi ai Xi ß.
xit,1, ... , xit,K and zit,1, ... , zit,q are
nonstochastic.
Var yi Ri(t) Ri.
yi are independent random vectors.
yit are normally distributed.

42
Least Squares Estimates

The estimates are derived in Appendix 2A.2.
They are given by
and
Here,

43
Robust estimation of standard errors

It is common practice to ignore serial
correlation and heteroscedasticity, so that one
assumes Ri s 2 Ii .
Thus,
where
Huber (1967), White (1980) and Liang and Zeger
(1986) suggested replacing Ri by ei ei . Here,
ei is the vector of residuals. Thus, a robust
standard error of bj is

Write a Comment

User Comments (0)