Title: 4. Using panel data
14. Using panel data
- 4.1 The basic idea
- 4.2 Linear regression
- 4.3 Logit and probit models
- 4.4 Other models
24.1 The basic idea
- Panel data data that are pooled for the same
companies across time. - In panel data, there are likely to be unobserved
company-specific characteristics that are
relatively constant over time. - I have already explained that it is necessary to
control for this time-series dependence in order
to obtain unbiased standard errors. - In STATA we can do this using the robust cluster
() option
34.1 The basic idea
- The first advantage of panel data is that we are
using a larger sample compared to the case where
we have only one observation per company. - The larger sample permits greater estimation
power, so the coefficients can be estimated more
precisely. - Since the standard errors are lower (even when
they are adjusted for time-series dependence), we
are more likely to find statistically significant
coefficients. - use "C\phd\Fees.dta", clear
- gen fyedate(yearend, "mdy")
- format fye d
- gen yearyear(fye)
- sort year
- gen lnafln(auditfees)
- gen lntaln(totalassets)
- by year reg lnaf lnta, robust cluster(companyid)
- reg lnaf lnta, robust cluster(companyid)
44.1 The basic idea
- The second advantage of panel data is that we can
estimate dynamic models. - For example, suppose we believe that audit fees
depend not only on the companys size but also
its rate of growth - sort companyid fye
- gen growth lnta- lnta_n-1 if companyid
companyid_n-1 - reg lnaf lnta growth, robust cluster( companyid)
- We find that audit firms offer lower fees to
companies that are growing more quickly - If we had had only one year of data, we would not
have been able to estimate this model.
54.1 The basic idea
- The third and most important advantage of
panel data is that we are able to control for
unobservable company-specific effects that are
correlated with the observed explanatory
variables - Lets start with a simple regression model
- Lets assume that the error term has an
unobserved company-specific component that does
not vary over time and an idiosyncratic component
that is unique to each company-year observation
64.1 The basic idea
- Putting the two together
- Recall that the standard error of ? will be
biased if we do not adjust for time-series
dependence - this adjustment is easy using the robust cluster
() option - The OLS estimate of the ? coefficient will be
unbiased as long as the unobservable
company-specific component (ui) is uncorrelated
with Xit
74.1 The basic idea
- Unfortunately, the assumption that ui is
uncorrelated with Xit is unlikely to hold in
practice. - If ui is correlated with Xit then ?it is also
correlated with Xit - The OLS estimate of ? will be biased if ?it is
correlated with Xit (recall our previous
discussion and notes on omitted variable bias)
84.1 The basic idea
- An example can illustrate this bias.
- Go to http//ihome.ust.hk/accl/Phd_teaching.htm
- use "C\phd\beatles.dta", clear
- list
- This dataset is a panel of four individuals
observed over three years (1968-70) - In each year they were asked how satisfied they
are with their lives - this is the lsat variable which takes larger
values for increasing satisfaction - You want to test how age affects life
satisfaction - reg lsat age
- It appears that they became slightly more
satisfied as they got older.
94.1 The basic idea
- Suppose you now include dummy variables for each
individual - tab persnr, gen(dum_)
- Recall that you must omit one dummy variable or
the intercept in order to avoid perfect
collinearity (see the previous notes about
multicollinearity) - reg lsat age dum_1 dum_2 dum_4
- reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
- There now appears to be a highly significant
negative impact of age on life satisfaction - Whats going on here?
104.1 The basic idea
- Recall that fitting a simple OLS model (lsat on
age) is equivalent to plotting a line of best fit
through the data - twoway (lfit lsat age) (scatter lsat age)
114.1 The basic idea
- I am now going to introduce a new command,
separate , by() - separate lsat, by(persnr)
- This creates four separate life satisfaction
variables for each of the four individuals - Now graph the relationship between life
satisfaction and age for each of the four people - twoway (lfit lsat1 age) (scatter lsat1 age)
- twoway (lfit lsat2 age) (scatter lsat2 age)
- twoway (lfit lsat3 age) (scatter lsat3 age)
- twoway (lfit lsat4 age) (scatter lsat4 age)
12(No Transcript)
13- It is clear that each of the four individuals
became less satisfied as they got older. - The simple OLS regression was biased because John
and Ringo (who happened to be older) were
generally more satisfied than Paul and George
(who happened to be younger) - The multiple OLS regression controlled for these
idiosyncratic differences by including dummy
variables for each person - We can see this by plotting the simple OLS
results and the multiple OLS results - reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
- predict lsat_hat
- separate lsat_hat, by(persnr)
- twoway (line lsat_hat1-lsat_hat4 age) (lfit lsat
age) (scatter lsat1-lsat4 age)
14(No Transcript)
154.1 The basic idea
- What does all this have to do with panel data
being advantageous? - Without panel data we would not have been able to
control for the idiosyncracies of the four
individuals. - If we had had data for only one year, we would
not have known that the age coefficient was
biased in the simple regression. - We can demonstrate this by running a regression
of lsat on age for each year in the sample - sort time
- by time reg lsat age
- Without panel data, we would have incorrectly
concluded that people get happier as they get
older
164.1 The basic idea
- In the multiple regression, we include dummy
variables (dum_1 dum_2 dum_3 dum_4) which control
for the individual-specific effects (ui) - Without including the person dummies, our
estimate of ? would be biased because the dummies
are correlated with age. - The person dummies explain all the
cross-sectional variation in life satisfaction
across the four individuals. - The only variation that is left is the change in
satisfaction within each person as he gets older.
- Therefore, the model with dummies is sometimes
called the within estimator or the
fixed-effects model.
174.1 The basic idea
- In small datasets like this, it is easy to create
dummy variables for each person (or each
company). - In large datasets, we may have thousands of
individuals or companies. - The number of variables in STATA is restricted
due to memory limits. - Also it is not very inconvenient to have results
for thousands of dummy variables (just imagine
how long your log file would be!).
184.1 The basic idea
- Instead of including dummy variables, we can
control for idiosyncratic effects by transforming
the Y and X variables. - Taking averages of eq. (1) over time gives
- Subtracting eq. (2) from eq. (1) gives
- The key thing to note here is that the
individual-specific effects (ui) have been
differenced out so they will not bias our
estimate of ?.
194.1 The basic idea
- Another transformation that will do the same
trick is to take differences rather than subtract
means - Lagging by one period
- Subtracting eq. (2) from eq. (1) gives
- Again the individual-specific effects (ui) have
been differenced out so they will not bias our
estimate of ?.
20Class exercise 4a
- Estimate the following models, where Y life
satisfaction and X age. - Compare the age coefficients in these models to
the age coefficient in the untransformed model
with person dummies (ignore the standard errors
of the age coefficients because they are biased)
21Class exercise 4a
- You should find that the age coefficients are
exactly the same. - First, we create the variables
- sort persnr time
- gen chlsatlsat-lsat_n-1 if persnrpersnr_n-1
- gen chageage-age_n-1 if persnrpersnr_n-1
- (NB the chage variable is just a constant
because each person gets older by one from one
year to the next list persnr time chage) - by persnr egen avlsatmean(lsat)
- by persnr egen avagemean(age)
- gen difflsatlsat-avlsat
- gen diffageage-avage
- Next, we run the three regressions without
constant terms (recall that the chage variable is
a constant) - reg chlsat chage, nocons
- reg difflsat diffage, nocons
- reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
224.2 Linear regression using panel data (xtreg, fe
i())
- Fortunately, STATA has a command that
- allows us to avoid creating dummy variables for
each person - corrects the standard errors
- xt is a prefix that tells STATA we want to
estimate a panel data model - The fe option tells STATA we want to estimate a
fixed effects model - in OLS this is equivalent to including dummy
variables to control for person-specific effects - The i() term tells STATA the variable that
identifies each unique person - xtreg lsat age , fe i(persnr)
23(No Transcript)
24- Note that the age coefficient and t-statistic are
exactly the same as in the OLS model that
includes person dummies - reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
- There are 12 person-years, 3 persons, and the
minimum, average and maximum number of
observations per person is 4.
25- Since we are estimating a within-effects model,
it is the within R2 that is directly relevant
(93.2). - If we used the same independent variables to
estimate a between-effects model, we would have
an R2 of 88.4 (I will explain later what we mean
by the between-effects model). - If we used the same independent variables to
estimate a simple OLS model, we would get an R2
of 16.5. (reg lsat age) - The F-statistic is a test that the coefficient(s)
on the X variable(s) (i.e., age) are all zero.
26- sigma_u is the standard deviation of the
estimates of the fixed effects, ui (?u) - sigma_e is the standard deviation of the
estimates of the residuals, eit (?e) - rho ?u2 / (?u2 ?e2)
- 4.932 / (4.932 0.472) 0.99
27- The correlation between uit and Xit is -0.83.
- This correlation appears to be high confirming
our prior finding that the fixed effects are
correlated with age. - The F-test allows us to reject the hypothesis
that there are no fixed effects. - If we had not rejected this hypothesis, we could
estimate a simple OLS instead of the
fixed-effects model.
284.2 Linear regression (predict)
- After running the fixed-effects model, we can
obtain various predicted statistics using the
predict command - predict , xb
- predict , u
- predict , e
- predict , ue
294.2 Linear regression (predict)
- For example
- xtreg lsat age , fe i(persnr)
- drop lsat_hat
- predict lsat_hat, xb
- predict lsat_u, u
- predict lsat_e, e
- predict lsat_ue, ue
- Checking that lsat_ue lsat_u lsat_e
- list lsat_u lsat_e lsat_ue
- Checking that the correlation between uit and Xit
is -0.83 - corr lsat_hat lsat_u
304.2 Linear regression
- I have explained that there are three main
advantages of panel data - The larger sample increases power, so the
coefficients are estimated more precisely - We can estimate models that incorporate dynamic
variables (e.g., the effect of growth on audit
fees) - We can control for unobservable fixed effects
(e.g., company-specific or person-specific
characteristics) by estimating fixed-effects
models.
314.2 Linear regression
- Are there any disadvantages?
- Yes, unfortunately we cannot investigate the
effect of explanatory variables that are held
constant over time. - From a technical point of view, this is because
the time-invariant variable would be perfectly
collinear with the person dummies. - From an economic point of view, this is because
fixed-effect models are designed to study what
causes the dependent variable to change within a
given person. A time-invariant characteristic
cannot cause such a change.
324.2 Linear regression
- For example, suppose that the height of the four
persons is constant over the three years. - Lets create a height variable and test the
effect of height on life satisfaction - gen height185 if dum_11
- replace height180 if dum_21
- replace height175 if dum_31
- replace height170 if dum_41
- list persnr height
- Note that the height variable is a constant for
each person. - We can estimate the effect of height as long as
we do not control for unobservable
person-specific effects - reg lsat age height
334.2 Linear regression
- If we try to control for person-specific effects
by including dummy variables - reg lsat age height dum_1 dum_2 dum_3 dum_4,
nocons - Note that STATA has to throw away either a dummy
variable or the height variable. - The reason is that the height variable is
collinear with the four dummy variables. - The only way we can include dummies for each
person is if we do not include the height
variable. - reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
- If we try to estimate the effect of height using
the xtreg, fe i() command, STATA will
inform us that there is a problem of perfect
collinearity - xtreg lsat age height, fe i( persnr)
344.2 Linear regression
- Note that the height coefficient can be estimated
if there is some variation over time for one or
more persons. - The fixed-effects estimator can exploit this time
variation to estimate the effect of height on
life satisfaction. - For example, suppose that each person became 1cm
taller in 1970. - replace height height1 if time1970
- xtreg lsat age height, fe i( persnr)
35- The xtreg, fe i() command estimates the following
fixed-effects model - Recall that we derived this model by taking
averages - The averages model is sometimes called the
between estimator because the comparison is
cross-sectional between persons rather than over
time. - Like OLS, the between estimator provides unbiased
estimates of ? only if the unobservable
company-specific component (ui) is uncorrelated
with Xit - If we wanted to estimate the between effects
model, the command in STATA is xtreg , be i() - xtreg lsat age, be i( persnr)
36(No Transcript)
37- Note that the age coefficient is positive
- the reason is that we are not controlling for
person-specific effects, which are correlated
with age. - therefore, the between-effects estimate of the
age coefficient is biased. - Since we are estimating a between-effects model,
it is the between R2 that is relevant (88.4). - Note that this is also the between-effects R2
that was previously reported using the
fixed-effects model. - Note that the R2 for the between-effects model is
high despite that the age coefficient is severely
biased. Again, this reinforces the fact that a
high R2 does not imply that the model is well
specified.
38- The between estimator is also less efficient than
simple OLS because it throws away all the
variation over time in the dependent and
independent variables. - In fact the between estimator is equivalent to
estimating an OLS model on the averages for just
one year - Recall that we have already created averages for
the lsat and age variables (avlsat avage) - reg avlsat avage if time1968
- reg avlsat avage if time1969
- reg avlsat avage if time1970
- xtreg lsat age height, be i( persnr)
- Since we actually have three years of data, it
seems silly (and it is silly) to throw data away
394.2 Linear regression (xtreg)
- Normally, then, we would never be interested in
estimating a between-effects model - The estimates are biased if the person-specific
effects are correlated with the X variables - The estimates are inefficient because we are
ignoring any time-series variation in the data - The fixed effects estimator is attractive because
it controls for any correlation between ui and
Xit - An unattractive feature is that it is forced to
estimate a fixed parameter for each person or
company in the data - you can think of these parameters as being the
coefficients on the person dummy variables
404.2 Linear regression (xtreg)
- An alternative is the random effects model in
which the ui are assumed to be randomly
distributed with a mean of zero and a constant
variance (ui IID(0, ?2u) rather than fixed. - Intuitively, the random effects model is like
having an OLS model where the constant term
varies randomly across individuals i. - Like simple OLS, the random effects model assumes
that there is zero correlation between ui and Xit - If ui and Xit are correlated, the random-effects
estimates are biased.
414.2 Linear regression (xtreg)
- The random-effects model can be thought of as an
intermediate case of OLS and the fixed-effects
model
424.2 Linear regression (xtreg)
- The OLS model corresponds to ? 0.
- The fixed-effects model corresponds to ? 1.
- The random-effects model (0 ? ? ? 1) is also
known as the generalized least squares model
(i.e., it is more general than OLS or the
fixed-effects model).
434.2 Linear regression (xtreg)
- If we want to estimate a random effects model,
the command is xtreg , re i() - For example
- xtreg lsat age, re i( persnr)
- Note that because we have controlled for (random)
unobserved person effects, the age coefficient is
estimated with the correct negative sign.
44- The rest of the output is similar to the
fixed-effects model except - We use a Wald statistic instead of an F statistic
to test the significance of the independent
variables. Here we can reject the hypothesis that
age is insignificant. - The Wald statistic is used because only the
asymptotic properties of the random-effects
estimator are known. - The output explicitly tells us that we have
imposed the assumption that ui and Xit are
uncorrelated. - This is the key difference between the
random-effects and fixed-effects models.
45- We can test whether ui and Xit are correlated.
- If they are correlated, we should use the
fixed-effects model rather than OLS or the
random-effects model (otherwise the coefficients
are biased). - If they are not correlated, it is better to use
the random-effects model (because it is more
efficient). - The test was devised by Hausman
- if ui and Xit are correlated, the random-effects
estimates are biased (inconsistent) while the
fixed-effects coefficients are unbiased
(consistent) - In this case, there will be a large difference
between the random-effects and fixed-effects
coefficient estimates - if ui and Xit are uncorrelated, the
random-effects and fixed-effects coefficients are
both unbiased (consistent) the fixed-effects
coefficients are inefficient while the
random-effects coefficients are efficient. - In this case, there will not be a large
difference between the random-effects and
fixed-effects coefficient estimates - The Hausman test indicates whether the two sets
of coefficient estimates are significantly
different
46- Null hypothesis (H0) ui and Xit are uncorrelated
- The Hausman statistic is distributed as chi2 and
is computed as - If the chi2 statistic is positive and
statistically significant, we can reject the null
hypothesis. This would mean that the
fixed-effects model is preferable because the
coefficients are consistent. - If the chi2 statistic is not positive and
statistically significant, we cannot reject the
null hypothesis. This would mean that the
random-effects model is preferable because the
coefficients are consistent and efficient. - NB The (Vc-Ve)-1 matrix is guaranteed to be
positive only asymptotically. In small samples,
this asymptotic result may not hold in which case
the computed chi2 statistic will be negative.
474.2 Linear regression (estimates store, hausman)
- The procedure for executing a Hausman test is as
follows - Save the coefficients that are consistent even if
the null is not true - xtreg lsat age, fe i( persnr)
- estimates store fixed_effects
- Save the coefficients that are inconsistent if
the null is not true - xtreg lsat age, re i( persnr)
- estimates store random_effects
- The command for the Hausman test is
- hausman name_consistent name_efficient
- hausman fixed_effects random_effects
48- b is the fixed-effects coefficient while B is the
random-effects coefficient. - The (Vc-Ve)-1 matrix has a negative value on the
leading diagonal and, as a result, the square
root of the leading diagonal is undefined. This
is why the Chi2 statistic is negative. - Since the Chi2 statistic is not significantly
positive, we might decide that we cannot reject
the null hypothesis (see p. 57 of the STATA
reference manual for the Hausman test). - On the other hand, this result is not very
reliable because the asymptotic assumption fails
to hold in this small sample.
49- If we reject the null hypothesis that ui and Xit
are uncorrelated, the fixed-effects model is
preferable to the OLS and random-effects models. - If we cannot reject the null hypothesis that ui
and Xit are uncorrelated, we need to determine
whether the ui are distributed randomly across
individuals. - Recall that the random-effects model is like
having an OLS model where the constant term
varies randomly across individuals i. - Therefore, we need to test whether there is
significant variation in ui across individuals.
50- rho ?u2 / (?u2 ?e2)
- 1.032 / (1.032 0.472) 0.83
- ?u2 captures the variation in ui across
individuals. - If ?u2 is significantly positive, the
random-effects model is preferable to the OLS
model. - The Breusch and Pagan (1980) Lagrange multiplier
test is used to investigate whether ?u2 is
significantly positive.
51- We perform the Breusch-Pagan test by typing
xttest0 after xtreg, re - Our estimate of ?u2 is 1.067 (note that ?u is
estimated to be 1.032 which is the same as
sigma_u on the previous slide). - We are unable to reject the hypothesis that ?u2
0. Therefore, we cannot conclude that the
random-effects model is preferable to the OLS
model. - NB Our Hausman and LM tests lack power because
the sample consists of only 12 observations. In
larger samples, we are more likely to reject the
hypothesis that ?u2 0 and we are more likely to
reject the hypothesis that ui and Xit are
uncorrelated.
52Class exercise 4b
- Estimate models in which the dependent variable
is the log of audit fees. - Estimate the models using
- OLS without controlling for ui
- Fixed-effects models
- Random-effects models
- How do the coefficient estimates vary across the
different models? - Which of these models is preferable?
53Class exercise 4b
- The lnta coefficients are largest in the OLS
model that does not control for ui - The lnta coefficients are smallest in the
fixed-effects model - The Hausman test rejects the hypothesis that ui
and Xit are uncorrelated. Therefore, the
fixed-effects model is preferable. - The LM test rejects the hypothesis that ?u2 0
(given that ui and Xit are significantly
correlated, we would not actually need to carry
out this test).
54Class exercise 4b
- use "C\phd\Fees.dta", clear
- gen fyedate(yearend, "mdy")
- format fye d
- gen yearyear(fye)
- sort year
- gen lnafln(auditfees)
- gen lntaln(totalassets)
- reg lnaf lnta
- xtreg lnaf lnta, fe i(companyid)
- estimates store fixed_effects
- xtreg lnaf lnta, re i(companyid)
- estimates store random_effects
- hausman fixed_effects random_effects
- xttest0
554.2 Linear regression
- Compared to economics and finance, there are not
many accounting studies that exploit panel data
in order to control for unobserved
company-specific effects (ui). - Most studies simply report OLS estimates on the
pooled data. - Some studies even fail to adjust the OLS standard
errors for time-series dependence - this can be a very serious mistake especially
when the panels are long (e.g., the sample period
covers many years). - If you use the xtreg command, STATA automatically
recognizes that you are using panel data and it
will give you the correct standard errors. - Therefore, there is no need to use the robust
cluster() option and, in fact, there is no robust
cluster() option with xtreg - xtreg lnaf lnta, fe i(companyid) robust
cluster(companyid)
564.2 Linear regression
- Ke and Petroni (2004) is an example of an
accounting study that estimates fixed-effects
regressions to control for unobservable
company-specific effects. - Their dependent variable is the change in the
ownership of institutional investors in
companies. - They test whether there are significant changes
in institutional ownership prior to a break in a
string of consecutive quarterly earnings
increases. - Bhattacharya et al. (2003) is an example of an
accounting study that estimates fixed-effects
regressions to control for unobservable
country-specific effects. - Their dependent variable is the cost of equity
for 34 countries between 1984-1998 (they are
using a cross-country panel) - They test how earnings opacity at the country
level affects the cost of equity - They acknowledge that there is a potentially
serious problem of omitted variable bias
57- Bhattacharya et al. (2003) argue that they
largely avoid this problem because they control
for fixed country-specific effects
584.2 Linear regression
- It is important to recognize that the fixed
effects estimator relies only on the time-series
variation in Y and X within a given company - If the extent of time-series variation is small,
either or will be close
to zero. - In this case, the fixed effects estimator is not
reliable because there is insufficient variation
in either the dependent or treatment variable.
594.2 Linear regression
- As in any model, we require a reasonable amount
of variation in the Y and X variables. - If either variable displays little variation, the
results may be unreliable.
- We saw an example of this previously.
- Except for one observation, the independent
variable is a constant. - As a result the fitted regression line is
unreliable.
604.2 Linear regression
- This point was made by Zhou (JFE, 2001) who
criticized the use of fixed effects models when
the treatment variable is management ownership. - Because management ownership usually remains
constant from one year to the next, the
term is typically equal to zero (or very
small).
614.3 Logit and probit models
- When the dependent variable is continuous, it is
easy to transform the model such that unobserved
firm-specific effects are washed away - When the dependent variable is binary, the
required transformation is different and more
complicated - if you are interested in the derivation, see the
Baltagi textbook (pages 178-180). - in the fixed-effects logit, the fixed effects
(ui) are not actually estimated, instead they are
conditioned out of the model. - the fixed-effects logit model is not equivalent
to logit dummy variables.
624.3 Logit models (xtlogit)
- We can estimate a fixed-effects logit model using
the command xtlogit , fe i() - NB Your version of STATA 9.0 may have a problem
with estimating the fixed effects logit model.
You can instead use version 8.0 or 10.0. - version 8.0
- Before we estimate the fixed-effects logit model,
we need to understand a complication that arises
because the dependent variable is binary.
63- Suppose we have five annual observations on two
companies. - For company 1, there is no variation in the
dependent variable over time (Y 0 in every
year). - A fixed effect for this company will perfectly
predict the outcome (Y 0) - Consequently, the first company will be dropped
from the estimation sample. - In fact, the fixed-effects logit model will drop
all companies that exhibit no variation in the
dependent variable over time.
644.3 Logit models (xtlogit)
- use "C\phd\xtlogit.dta", clear
- list
- The sample consists of three companies.
- Company 1 exhibits no variation in the dependent
variable over time while companies 2 3 do
exhibit time-series variation. - There is no problem estimating this model on the
full sample if we do not control for fixed
effects - logit y x
- Running a fixed effects logit model results in
the first company being thrown away - xtlogit y x, fe i(id)
654.3 Logit models (xtlogit)
- In many empirical settings, we are likely to find
a large number of companies that exhibit no
variation in the binary dependent variable during
the sample period. - Example 1
- Yit 1 if company i is engaged in fraud in year
t Yit 0 otherwise. - The vast majority of companies do not engage in
fraud at any point in time (Yit 0 for all t). - All such non-fraud companies would be dropped
from the estimation sample. - The estimation sample would include only the
companies that commit fraud at some point during
the sample period.
664.3 Logit models (xtlogit)
- Example 2
- Yit 1 if company i hires a Big 6 auditor in
year t Yit 0 if company i hires a non-Big 6
auditor in year t. - The vast majority of companies keep the same
auditor in the following year and switches
between Big 6 and non-Big 6 auditors are
especially rare. - All companies that do not switch between Big 6
and Non-Big 6 auditors would be dropped from the
sample. - The estimation sample would include only the
companies that switch between Big 6 and Non-Big 6
auditors at some point during the sample period.
674.3 Logit models (xtlogit)
- Alternatively, we can estimate a random-effects
logit model using the command xtlogit , re i() - The company effects (ui) are now assumed to be
random rather than fixed. - Consequently, the random effects model does not
throw away companies that lack time-series
variation in the dependent variable. - For example
- xtlogit y x, re i(id)
68- The estimation sample is now 15 rather than 10
(i.e., all 3 companies are included in the
sample). - lnsig2u ln(?u2) -1.625
- sigma_u ?u 0.444 exp(-1.625)0.5
- rho ?u2 / (?u2 ?e2) 0.056
69- If rho ?u2 / (?u2 ?e2) 0, there would be no
variation in the ui across companies (i.e., each
company would have the same ui). - In this case, there would be no need to control
for company-specific effects, i.e., we could rely
on logit instead of estimating xtlogit , re i() - The likelihood-ratio statistic tests the null
hypothesis that rho equals zero. - If we reject this hypothesis, the random effects
model is preferable to ordinary logit. - In our data, we are unable to reject, so we could
use an ordinary logit model instead of the random
effects logit model. This would be a good idea
because the ordinary logit is more efficient
(fewer parameters need to be estimated).
704.3 Logit models (xtlogit)
- Recall that we previously used a Hausman test to
determine whether the xtreg, fe i() or xtreg, re
i() model is preferable. - Fortunately, we can do the same test in STATA for
deciding whether the fixed-effects or
random-effects logit models are preferable. - The only difference is that we have to use the
equations() option with the Hausman test - actually, this point is not explained in the
STATA manual but a question and answer were
posted about this topic on the statalist
(www.stata.com/statalist/archive/2004-01/msg00669.
html) - the equations() option specifies, by number, the
pairs of equations that are to be compared. - usually, we are estimating just one equation in
each model, in which case the option is
equations(11)
714.3 Logit models (xtlogit)
- For example
- xtlogit y x, fe i(id)
- estimates store fixed_effects
- xtlogit y x, re i(id)
- estimates store random_effects
- hausman fixed_effects random_effects
- STATA is telling us there is an error (we need to
specify the equation numbers) - hausman fixed_effects random_effects, eq(11)
- The Chi2 statistic is negative (again there is a
small sample problem which causes the asymptotic
assumption to fail).
72Class exercise 4c
- Open the fee.dta data set.
- Estimate models in which big6 is the dummy
dependent variable using - ordinary logit
- fixed-effects logit
- random-effects logit
- Why is the estimation sample much smaller in the
fixed effects model? - Which of the three models is most preferable?
73Class exercise 4c
- use "C\phd\Fees.dta", clear
- gen lntaln(totalassets)
- logit big6 lnta, robust cluster(companyid)
- xtlogit big6 lnta, fe i(companyid)
- estimates store fixed_effects
- xtlogit big6 lnta, re i(companyid)
- estimates store random_effects
- hausman fixed_effects random_effects, eq(11)
- The estimation sample is much smaller in the
fixed effects model because the majority of
companies do not switch between Big 6 and Non-Big
6 auditors during the sample period. - The likelihood ratio test of rho 0 indicates
that the random-effects model is preferable to
the ordinary logit. - The Hausman test indicates that the fixed-effects
model is preferable to the random-effects logit.
744.3 Probit models (xtprobit)
- Recall that there are two commands available when
the dependent variable is binary (ordinary
logit and probit). - There is no command for a fixed-effects probit
model because no-one has yet found a
transformation that will allow the fixed effects
to be washed out. - If you type xtprobit big6 lnta, fe i(companyid)
you will get an error message. - A random-effects probit model is available,
however - xtprobit big6 lnta, re i(companyid)
- Just as with the random-effects logit model,
there is a likelihood ratio test that helps us to
choose between the random-effects probit and the
ordinary probit models. - In our data, we can reject the hypothesis that
rho 0, so we may decide not to use an ordinary
probit model.
754.4 Other models
764.4 Other models
- If you look at the STATA manual for panel data
(Cross-Sectional Time-series), you will find - Fixed-effects and random-effects models are
available for count data (xtpoisson and xtnbreg) - We can test which model is preferable using a
Hausman - Random-effects models are available for censored
data (xttobit and xtintreg) - fixed-effects models are not available
- therefore there is no need for a Hausman test
774.4 Other models
- Duration data is, by its very nature, in the form
of panel data. - What about the multinomial and ordered models
that we previously looked at (mlogit, mprobit,
ologit, oprobit)? It appears that STATA does not
have random- or fixed-effects versions of these
models.
784.4 Other models
- You can use the search command in STATA to find
out if a command is available. - The search command looks through official STATA
commands, frequently asked questions (on the
STATA website), the STATA journal (SJ) and the
STATA technical bulletins (STBs) - The SJ and STBs are where you can sometimes find
commands that will appear in future versions of
STATA - search multinomial logit
- We can find the multinomial logit command but
there does not appear to be any command
specifically for the multinomial model with panel
data
794.4 Other models
- Even if the command you want is not available
from STATA, you may be able to find a STATA user
who has already written the program that you
need. - Statalist (www.stata.com/statalist/) is an email
listserver where over 2,500 Stata users discuss
all things statistical and Stata. - Click on Archives provided by Statacorp and
search the archives
804.4 Other models
- For example, suppose you want to estimate a
random-effects ordered probit - Typing this into the statalist archive I found
that someone has written a program with this
command (reoprob) www.stata.com/statalist/archive/
2006-02/msg00509.html - The message tells us we can download it to STATA
by typing - findit reoprob
814.4 Other models
- If you cannot find someone who has already
written the program and if it is a command that
you really do need, you will either have to write
the program yourself or wait for someone else to
do it. - In fact, it is not too difficult to learn how to
write new programs in STATA - you would need to take a STATA programming course
- www.stata.com/netcourse/
- net courses 151 152
82Summary
- There are three advantages to using panel data
- We can control for unobservable fixed effects
that might otherwise bias the coefficient
estimates. - these unobservable fixed effects can be
company-specific, country-specific, or
person-specific. - The larger sample means that the coefficients are
estimated more precisely. - We can include lagged or change variables in our
models.
83Summary
- The xtreg command is used to estimate
fixed-effects and random-effects models (where
the dependent variable is continuous). - We can test whether the fixed-effects or
random-effects model is preferable using the
hausman test. - If there is a significant correlation between ui
and Xit, the fixed effects model is preferable to
the OLS and random effects models. - If there is no significant correlation between ui
and Xit, we can test whether the OLS or
random-effects model is preferable using a LM
test.
84Summary
- When the dependent variable is binary we can
estimate fixed-effects or random-effects logit
models. - Again, we can test which model is preferable
using a Hausman test. - Only the random-effects model is available in the
case of the probit model.