4. Using panel data presentation

About This Presentation

Transcript and Presenter's Notes

Title: 4. Using panel data

1
4. Using panel data

4.1 The basic idea
4.2 Linear regression
4.3 Logit and probit models
4.4 Other models

2
4.1 The basic idea

Panel data data that are pooled for the same
companies across time.
In panel data, there are likely to be unobserved
company-specific characteristics that are
relatively constant over time.
I have already explained that it is necessary to
control for this time-series dependence in order
to obtain unbiased standard errors.
In STATA we can do this using the robust cluster
() option

3
4.1 The basic idea

The first advantage of panel data is that we are
using a larger sample compared to the case where
we have only one observation per company.
The larger sample permits greater estimation
power, so the coefficients can be estimated more
precisely.
Since the standard errors are lower (even when
they are adjusted for time-series dependence), we
are more likely to find statistically significant
coefficients.
use "C\phd\Fees.dta", clear
gen fyedate(yearend, "mdy")
format fye d
gen yearyear(fye)
sort year
gen lnafln(auditfees)
gen lntaln(totalassets)
by year reg lnaf lnta, robust cluster(companyid)
reg lnaf lnta, robust cluster(companyid)

4
4.1 The basic idea

The second advantage of panel data is that we can
estimate dynamic models.
For example, suppose we believe that audit fees
depend not only on the companys size but also
its rate of growth
sort companyid fye
gen growth lnta- lnta_n-1 if companyid
companyid_n-1
reg lnaf lnta growth, robust cluster( companyid)
We find that audit firms offer lower fees to
companies that are growing more quickly
If we had had only one year of data, we would not
have been able to estimate this model.

5
4.1 The basic idea

The third and most important advantage of
panel data is that we are able to control for
unobservable company-specific effects that are
correlated with the observed explanatory
variables
Lets start with a simple regression model
Lets assume that the error term has an
unobserved company-specific component that does
not vary over time and an idiosyncratic component
that is unique to each company-year observation

6
4.1 The basic idea

Putting the two together
Recall that the standard error of ? will be
biased if we do not adjust for time-series
dependence
this adjustment is easy using the robust cluster
() option
The OLS estimate of the ? coefficient will be
unbiased as long as the unobservable
company-specific component (ui) is uncorrelated
with Xit

7
4.1 The basic idea

Unfortunately, the assumption that ui is
uncorrelated with Xit is unlikely to hold in
practice.
If ui is correlated with Xit then ?it is also
correlated with Xit
The OLS estimate of ? will be biased if ?it is
correlated with Xit (recall our previous
discussion and notes on omitted variable bias)

8
4.1 The basic idea

An example can illustrate this bias.
Go to http//ihome.ust.hk/accl/Phd_teaching.htm
use "C\phd\beatles.dta", clear
list
This dataset is a panel of four individuals
observed over three years (1968-70)
In each year they were asked how satisfied they
are with their lives
this is the lsat variable which takes larger
values for increasing satisfaction
You want to test how age affects life
satisfaction
reg lsat age
It appears that they became slightly more
satisfied as they got older.

9
4.1 The basic idea

Suppose you now include dummy variables for each
individual
tab persnr, gen(dum_)
Recall that you must omit one dummy variable or
the intercept in order to avoid perfect
collinearity (see the previous notes about
multicollinearity)
reg lsat age dum_1 dum_2 dum_4
reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
There now appears to be a highly significant
negative impact of age on life satisfaction
Whats going on here?

10
4.1 The basic idea

Recall that fitting a simple OLS model (lsat on
age) is equivalent to plotting a line of best fit
through the data
twoway (lfit lsat age) (scatter lsat age)

11
4.1 The basic idea

I am now going to introduce a new command,
separate , by()
separate lsat, by(persnr)
This creates four separate life satisfaction
variables for each of the four individuals
Now graph the relationship between life
satisfaction and age for each of the four people
twoway (lfit lsat1 age) (scatter lsat1 age)
twoway (lfit lsat2 age) (scatter lsat2 age)
twoway (lfit lsat3 age) (scatter lsat3 age)
twoway (lfit lsat4 age) (scatter lsat4 age)

12
(No Transcript)
13

It is clear that each of the four individuals
became less satisfied as they got older.
The simple OLS regression was biased because John
and Ringo (who happened to be older) were
generally more satisfied than Paul and George
(who happened to be younger)
The multiple OLS regression controlled for these
idiosyncratic differences by including dummy
variables for each person
We can see this by plotting the simple OLS
results and the multiple OLS results
reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
predict lsat_hat
separate lsat_hat, by(persnr)
twoway (line lsat_hat1-lsat_hat4 age) (lfit lsat
age) (scatter lsat1-lsat4 age)

14
(No Transcript)
15
4.1 The basic idea

What does all this have to do with panel data
being advantageous?
Without panel data we would not have been able to
control for the idiosyncracies of the four
individuals.
If we had had data for only one year, we would
not have known that the age coefficient was
biased in the simple regression.
We can demonstrate this by running a regression
of lsat on age for each year in the sample
sort time
by time reg lsat age
Without panel data, we would have incorrectly
concluded that people get happier as they get
older

16
4.1 The basic idea

In the multiple regression, we include dummy
variables (dum_1 dum_2 dum_3 dum_4) which control
for the individual-specific effects (ui)
Without including the person dummies, our
estimate of ? would be biased because the dummies
are correlated with age.
The person dummies explain all the
cross-sectional variation in life satisfaction
across the four individuals.
The only variation that is left is the change in
satisfaction within each person as he gets older.
Therefore, the model with dummies is sometimes
called the within estimator or the
fixed-effects model.

17
4.1 The basic idea

In small datasets like this, it is easy to create
dummy variables for each person (or each
company).
In large datasets, we may have thousands of
individuals or companies.
The number of variables in STATA is restricted
due to memory limits.
Also it is not very inconvenient to have results
for thousands of dummy variables (just imagine
how long your log file would be!).

18
4.1 The basic idea

Instead of including dummy variables, we can
control for idiosyncratic effects by transforming
the Y and X variables.
Taking averages of eq. (1) over time gives
Subtracting eq. (2) from eq. (1) gives
The key thing to note here is that the
individual-specific effects (ui) have been
differenced out so they will not bias our
estimate of ?.

19
4.1 The basic idea

Another transformation that will do the same
trick is to take differences rather than subtract
means
Lagging by one period
Subtracting eq. (2) from eq. (1) gives
Again the individual-specific effects (ui) have
been differenced out so they will not bias our
estimate of ?.

20
Class exercise 4a

Estimate the following models, where Y life
satisfaction and X age.
Compare the age coefficients in these models to
the age coefficient in the untransformed model
with person dummies (ignore the standard errors
of the age coefficients because they are biased)

21
Class exercise 4a

You should find that the age coefficients are
exactly the same.
First, we create the variables
sort persnr time
gen chlsatlsat-lsat_n-1 if persnrpersnr_n-1
gen chageage-age_n-1 if persnrpersnr_n-1
(NB the chage variable is just a constant
because each person gets older by one from one
year to the next list persnr time chage)
by persnr egen avlsatmean(lsat)
by persnr egen avagemean(age)
gen difflsatlsat-avlsat
gen diffageage-avage
Next, we run the three regressions without
constant terms (recall that the chage variable is
a constant)
reg chlsat chage, nocons
reg difflsat diffage, nocons
reg lsat age dum_1 dum_2 dum_3 dum_4, nocons

22
4.2 Linear regression using panel data (xtreg, fe
i())

Fortunately, STATA has a command that
allows us to avoid creating dummy variables for
each person
corrects the standard errors
xt is a prefix that tells STATA we want to
estimate a panel data model
The fe option tells STATA we want to estimate a
fixed effects model
in OLS this is equivalent to including dummy
variables to control for person-specific effects
The i() term tells STATA the variable that
identifies each unique person
xtreg lsat age , fe i(persnr)

23
(No Transcript)
24

Note that the age coefficient and t-statistic are
exactly the same as in the OLS model that
includes person dummies
reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
There are 12 person-years, 3 persons, and the
minimum, average and maximum number of
observations per person is 4.

Since we are estimating a within-effects model,
it is the within R2 that is directly relevant
(93.2).
If we used the same independent variables to
estimate a between-effects model, we would have
an R2 of 88.4 (I will explain later what we mean
by the between-effects model).
If we used the same independent variables to
estimate a simple OLS model, we would get an R2
of 16.5. (reg lsat age)
The F-statistic is a test that the coefficient(s)
on the X variable(s) (i.e., age) are all zero.

sigma_u is the standard deviation of the
estimates of the fixed effects, ui (?u)
sigma_e is the standard deviation of the
estimates of the residuals, eit (?e)
rho ?u2 / (?u2 ?e2)
4.932 / (4.932 0.472) 0.99

The correlation between uit and Xit is -0.83.
This correlation appears to be high confirming
our prior finding that the fixed effects are
correlated with age.
The F-test allows us to reject the hypothesis
that there are no fixed effects.
If we had not rejected this hypothesis, we could
estimate a simple OLS instead of the
fixed-effects model.

28
4.2 Linear regression (predict)

After running the fixed-effects model, we can
obtain various predicted statistics using the
predict command
predict , xb
predict , u
predict , e
predict , ue

29
4.2 Linear regression (predict)

For example
xtreg lsat age , fe i(persnr)
drop lsat_hat
predict lsat_hat, xb
predict lsat_u, u
predict lsat_e, e
predict lsat_ue, ue
Checking that lsat_ue lsat_u lsat_e
list lsat_u lsat_e lsat_ue
Checking that the correlation between uit and Xit
is -0.83
corr lsat_hat lsat_u

30
4.2 Linear regression

I have explained that there are three main
advantages of panel data
The larger sample increases power, so the
coefficients are estimated more precisely
We can estimate models that incorporate dynamic
variables (e.g., the effect of growth on audit
fees)
We can control for unobservable fixed effects
(e.g., company-specific or person-specific
characteristics) by estimating fixed-effects
models.

31
4.2 Linear regression

Are there any disadvantages?
Yes, unfortunately we cannot investigate the
effect of explanatory variables that are held
constant over time.
From a technical point of view, this is because
the time-invariant variable would be perfectly
collinear with the person dummies.
From an economic point of view, this is because
fixed-effect models are designed to study what
causes the dependent variable to change within a
given person. A time-invariant characteristic
cannot cause such a change.

32
4.2 Linear regression

For example, suppose that the height of the four
persons is constant over the three years.
Lets create a height variable and test the
effect of height on life satisfaction
gen height185 if dum_11
replace height180 if dum_21
replace height175 if dum_31
replace height170 if dum_41
list persnr height
Note that the height variable is a constant for
each person.
We can estimate the effect of height as long as
we do not control for unobservable
person-specific effects
reg lsat age height

33
4.2 Linear regression

If we try to control for person-specific effects
by including dummy variables
reg lsat age height dum_1 dum_2 dum_3 dum_4,
nocons
Note that STATA has to throw away either a dummy
variable or the height variable.
The reason is that the height variable is
collinear with the four dummy variables.
The only way we can include dummies for each
person is if we do not include the height
variable.
reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
If we try to estimate the effect of height using
the xtreg, fe i() command, STATA will
inform us that there is a problem of perfect
collinearity
xtreg lsat age height, fe i( persnr)

34
4.2 Linear regression

Note that the height coefficient can be estimated
if there is some variation over time for one or
more persons.
The fixed-effects estimator can exploit this time
variation to estimate the effect of height on
life satisfaction.
For example, suppose that each person became 1cm
taller in 1970.
replace height height1 if time1970
xtreg lsat age height, fe i( persnr)

The xtreg, fe i() command estimates the following
fixed-effects model
Recall that we derived this model by taking
averages
The averages model is sometimes called the
between estimator because the comparison is
cross-sectional between persons rather than over
time.
Like OLS, the between estimator provides unbiased
estimates of ? only if the unobservable
company-specific component (ui) is uncorrelated
with Xit
If we wanted to estimate the between effects
model, the command in STATA is xtreg , be i()
xtreg lsat age, be i( persnr)

36
(No Transcript)
37

Note that the age coefficient is positive
the reason is that we are not controlling for
person-specific effects, which are correlated
with age.
therefore, the between-effects estimate of the
age coefficient is biased.
Since we are estimating a between-effects model,
it is the between R2 that is relevant (88.4).
Note that this is also the between-effects R2
that was previously reported using the
fixed-effects model.
Note that the R2 for the between-effects model is
high despite that the age coefficient is severely
biased. Again, this reinforces the fact that a
high R2 does not imply that the model is well
specified.

The between estimator is also less efficient than
simple OLS because it throws away all the
variation over time in the dependent and
independent variables.
In fact the between estimator is equivalent to
estimating an OLS model on the averages for just
one year
Recall that we have already created averages for
the lsat and age variables (avlsat avage)
reg avlsat avage if time1968
reg avlsat avage if time1969
reg avlsat avage if time1970
xtreg lsat age height, be i( persnr)
Since we actually have three years of data, it
seems silly (and it is silly) to throw data away

39
4.2 Linear regression (xtreg)

Normally, then, we would never be interested in
estimating a between-effects model
The estimates are biased if the person-specific
effects are correlated with the X variables
The estimates are inefficient because we are
ignoring any time-series variation in the data
The fixed effects estimator is attractive because
it controls for any correlation between ui and
Xit
An unattractive feature is that it is forced to
estimate a fixed parameter for each person or
company in the data
you can think of these parameters as being the
coefficients on the person dummy variables

40
4.2 Linear regression (xtreg)

An alternative is the random effects model in
which the ui are assumed to be randomly
distributed with a mean of zero and a constant
variance (ui IID(0, ?2u) rather than fixed.
Intuitively, the random effects model is like
having an OLS model where the constant term
varies randomly across individuals i.
Like simple OLS, the random effects model assumes
that there is zero correlation between ui and Xit
If ui and Xit are correlated, the random-effects
estimates are biased.

41
4.2 Linear regression (xtreg)

The random-effects model can be thought of as an
intermediate case of OLS and the fixed-effects
model

42
4.2 Linear regression (xtreg)

The OLS model corresponds to ? 0.
The fixed-effects model corresponds to ? 1.
The random-effects model (0 ? ? ? 1) is also
known as the generalized least squares model
(i.e., it is more general than OLS or the
fixed-effects model).

43
4.2 Linear regression (xtreg)

If we want to estimate a random effects model,
the command is xtreg , re i()
For example
xtreg lsat age, re i( persnr)
Note that because we have controlled for (random)
unobserved person effects, the age coefficient is
estimated with the correct negative sign.

The rest of the output is similar to the
fixed-effects model except
We use a Wald statistic instead of an F statistic
to test the significance of the independent
variables. Here we can reject the hypothesis that
age is insignificant.
The Wald statistic is used because only the
asymptotic properties of the random-effects
estimator are known.
The output explicitly tells us that we have
imposed the assumption that ui and Xit are
uncorrelated.
This is the key difference between the
random-effects and fixed-effects models.

We can test whether ui and Xit are correlated.
If they are correlated, we should use the
fixed-effects model rather than OLS or the
random-effects model (otherwise the coefficients
are biased).
If they are not correlated, it is better to use
the random-effects model (because it is more
efficient).
The test was devised by Hausman
if ui and Xit are correlated, the random-effects
estimates are biased (inconsistent) while the
fixed-effects coefficients are unbiased
(consistent)
In this case, there will be a large difference
between the random-effects and fixed-effects
coefficient estimates
if ui and Xit are uncorrelated, the
random-effects and fixed-effects coefficients are
both unbiased (consistent) the fixed-effects
coefficients are inefficient while the
random-effects coefficients are efficient.
In this case, there will not be a large
difference between the random-effects and
fixed-effects coefficient estimates
The Hausman test indicates whether the two sets
of coefficient estimates are significantly
different

Null hypothesis (H0) ui and Xit are uncorrelated
The Hausman statistic is distributed as chi2 and
is computed as
If the chi2 statistic is positive and
statistically significant, we can reject the null
hypothesis. This would mean that the
fixed-effects model is preferable because the
coefficients are consistent.
If the chi2 statistic is not positive and
statistically significant, we cannot reject the
null hypothesis. This would mean that the
random-effects model is preferable because the
coefficients are consistent and efficient.
NB The (Vc-Ve)-1 matrix is guaranteed to be
positive only asymptotically. In small samples,
this asymptotic result may not hold in which case
the computed chi2 statistic will be negative.

47
4.2 Linear regression (estimates store, hausman)

The procedure for executing a Hausman test is as
follows
Save the coefficients that are consistent even if
the null is not true
xtreg lsat age, fe i( persnr)
estimates store fixed_effects
Save the coefficients that are inconsistent if
the null is not true
xtreg lsat age, re i( persnr)
estimates store random_effects
The command for the Hausman test is
hausman name_consistent name_efficient
hausman fixed_effects random_effects

b is the fixed-effects coefficient while B is the
random-effects coefficient.
The (Vc-Ve)-1 matrix has a negative value on the
leading diagonal and, as a result, the square
root of the leading diagonal is undefined. This
is why the Chi2 statistic is negative.
Since the Chi2 statistic is not significantly
positive, we might decide that we cannot reject
the null hypothesis (see p. 57 of the STATA
reference manual for the Hausman test).
On the other hand, this result is not very
reliable because the asymptotic assumption fails
to hold in this small sample.

If we reject the null hypothesis that ui and Xit
are uncorrelated, the fixed-effects model is
preferable to the OLS and random-effects models.
If we cannot reject the null hypothesis that ui
and Xit are uncorrelated, we need to determine
whether the ui are distributed randomly across
individuals.
Recall that the random-effects model is like
having an OLS model where the constant term
varies randomly across individuals i.
Therefore, we need to test whether there is
significant variation in ui across individuals.

rho ?u2 / (?u2 ?e2)
1.032 / (1.032 0.472) 0.83
?u2 captures the variation in ui across
individuals.
If ?u2 is significantly positive, the
random-effects model is preferable to the OLS
model.
The Breusch and Pagan (1980) Lagrange multiplier
test is used to investigate whether ?u2 is
significantly positive.

We perform the Breusch-Pagan test by typing
xttest0 after xtreg, re
Our estimate of ?u2 is 1.067 (note that ?u is
estimated to be 1.032 which is the same as
sigma_u on the previous slide).
We are unable to reject the hypothesis that ?u2
0. Therefore, we cannot conclude that the
random-effects model is preferable to the OLS
model.
NB Our Hausman and LM tests lack power because
the sample consists of only 12 observations. In
larger samples, we are more likely to reject the
hypothesis that ?u2 0 and we are more likely to
reject the hypothesis that ui and Xit are
uncorrelated.

52
Class exercise 4b

Estimate models in which the dependent variable
is the log of audit fees.
Estimate the models using
OLS without controlling for ui
Fixed-effects models
Random-effects models
How do the coefficient estimates vary across the
different models?
Which of these models is preferable?

53
Class exercise 4b

The lnta coefficients are largest in the OLS
model that does not control for ui
The lnta coefficients are smallest in the
fixed-effects model
The Hausman test rejects the hypothesis that ui
and Xit are uncorrelated. Therefore, the
fixed-effects model is preferable.
The LM test rejects the hypothesis that ?u2 0
(given that ui and Xit are significantly
correlated, we would not actually need to carry
out this test).

54
Class exercise 4b

use "C\phd\Fees.dta", clear
gen fyedate(yearend, "mdy")
format fye d
gen yearyear(fye)
sort year
gen lnafln(auditfees)
gen lntaln(totalassets)
reg lnaf lnta
xtreg lnaf lnta, fe i(companyid)
estimates store fixed_effects
xtreg lnaf lnta, re i(companyid)
estimates store random_effects
hausman fixed_effects random_effects
xttest0

55
4.2 Linear regression

Compared to economics and finance, there are not
many accounting studies that exploit panel data
in order to control for unobserved
company-specific effects (ui).
Most studies simply report OLS estimates on the
pooled data.
Some studies even fail to adjust the OLS standard
errors for time-series dependence
this can be a very serious mistake especially
when the panels are long (e.g., the sample period
covers many years).
If you use the xtreg command, STATA automatically
recognizes that you are using panel data and it
will give you the correct standard errors.
Therefore, there is no need to use the robust
cluster() option and, in fact, there is no robust
cluster() option with xtreg
xtreg lnaf lnta, fe i(companyid) robust
cluster(companyid)

56
4.2 Linear regression

Ke and Petroni (2004) is an example of an
accounting study that estimates fixed-effects
regressions to control for unobservable
company-specific effects.
Their dependent variable is the change in the
ownership of institutional investors in
companies.
They test whether there are significant changes
in institutional ownership prior to a break in a
string of consecutive quarterly earnings
increases.
Bhattacharya et al. (2003) is an example of an
accounting study that estimates fixed-effects
regressions to control for unobservable
country-specific effects.
Their dependent variable is the cost of equity
for 34 countries between 1984-1998 (they are
using a cross-country panel)
They test how earnings opacity at the country
level affects the cost of equity
They acknowledge that there is a potentially
serious problem of omitted variable bias

Bhattacharya et al. (2003) argue that they
largely avoid this problem because they control
for fixed country-specific effects

58
4.2 Linear regression

It is important to recognize that the fixed
effects estimator relies only on the time-series
variation in Y and X within a given company
If the extent of time-series variation is small,
either or will be close
to zero.
In this case, the fixed effects estimator is not
reliable because there is insufficient variation
in either the dependent or treatment variable.

59
4.2 Linear regression

As in any model, we require a reasonable amount
of variation in the Y and X variables.
If either variable displays little variation, the
results may be unreliable.

We saw an example of this previously.
Except for one observation, the independent
variable is a constant.
As a result the fitted regression line is
unreliable.

60
4.2 Linear regression

This point was made by Zhou (JFE, 2001) who
criticized the use of fixed effects models when
the treatment variable is management ownership.
Because management ownership usually remains
constant from one year to the next, the
term is typically equal to zero (or very
small).

61
4.3 Logit and probit models

When the dependent variable is continuous, it is
easy to transform the model such that unobserved
firm-specific effects are washed away
When the dependent variable is binary, the
required transformation is different and more
complicated
if you are interested in the derivation, see the
Baltagi textbook (pages 178-180).
in the fixed-effects logit, the fixed effects
(ui) are not actually estimated, instead they are
conditioned out of the model.
the fixed-effects logit model is not equivalent
to logit dummy variables.

62
4.3 Logit models (xtlogit)

We can estimate a fixed-effects logit model using
the command xtlogit , fe i()
NB Your version of STATA 9.0 may have a problem
with estimating the fixed effects logit model.
You can instead use version 8.0 or 10.0.
version 8.0
Before we estimate the fixed-effects logit model,
we need to understand a complication that arises
because the dependent variable is binary.

Suppose we have five annual observations on two
companies.
For company 1, there is no variation in the
dependent variable over time (Y 0 in every
year).
A fixed effect for this company will perfectly
predict the outcome (Y 0)
Consequently, the first company will be dropped
from the estimation sample.
In fact, the fixed-effects logit model will drop
all companies that exhibit no variation in the
dependent variable over time.

64
4.3 Logit models (xtlogit)

use "C\phd\xtlogit.dta", clear
list
The sample consists of three companies.
Company 1 exhibits no variation in the dependent
variable over time while companies 2 3 do
exhibit time-series variation.
There is no problem estimating this model on the
full sample if we do not control for fixed
effects
logit y x
Running a fixed effects logit model results in
the first company being thrown away
xtlogit y x, fe i(id)

65
4.3 Logit models (xtlogit)

In many empirical settings, we are likely to find
a large number of companies that exhibit no
variation in the binary dependent variable during
the sample period.
Example 1
Yit 1 if company i is engaged in fraud in year
t Yit 0 otherwise.
The vast majority of companies do not engage in
fraud at any point in time (Yit 0 for all t).
All such non-fraud companies would be dropped
from the estimation sample.
The estimation sample would include only the
companies that commit fraud at some point during
the sample period.

66
4.3 Logit models (xtlogit)

Example 2
Yit 1 if company i hires a Big 6 auditor in
year t Yit 0 if company i hires a non-Big 6
auditor in year t.
The vast majority of companies keep the same
auditor in the following year and switches
between Big 6 and non-Big 6 auditors are
especially rare.
All companies that do not switch between Big 6
and Non-Big 6 auditors would be dropped from the
sample.
The estimation sample would include only the
companies that switch between Big 6 and Non-Big 6
auditors at some point during the sample period.

67
4.3 Logit models (xtlogit)

Alternatively, we can estimate a random-effects
logit model using the command xtlogit , re i()
The company effects (ui) are now assumed to be
random rather than fixed.
Consequently, the random effects model does not
throw away companies that lack time-series
variation in the dependent variable.
For example
xtlogit y x, re i(id)

The estimation sample is now 15 rather than 10
(i.e., all 3 companies are included in the
sample).
lnsig2u ln(?u2) -1.625
sigma_u ?u 0.444 exp(-1.625)0.5
rho ?u2 / (?u2 ?e2) 0.056

If rho ?u2 / (?u2 ?e2) 0, there would be no
variation in the ui across companies (i.e., each
company would have the same ui).
In this case, there would be no need to control
for company-specific effects, i.e., we could rely
on logit instead of estimating xtlogit , re i()
The likelihood-ratio statistic tests the null
hypothesis that rho equals zero.
If we reject this hypothesis, the random effects
model is preferable to ordinary logit.
In our data, we are unable to reject, so we could
use an ordinary logit model instead of the random
effects logit model. This would be a good idea
because the ordinary logit is more efficient
(fewer parameters need to be estimated).

70
4.3 Logit models (xtlogit)

Recall that we previously used a Hausman test to
determine whether the xtreg, fe i() or xtreg, re
i() model is preferable.
Fortunately, we can do the same test in STATA for
deciding whether the fixed-effects or
random-effects logit models are preferable.
The only difference is that we have to use the
equations() option with the Hausman test
actually, this point is not explained in the
STATA manual but a question and answer were
posted about this topic on the statalist
(www.stata.com/statalist/archive/2004-01/msg00669.
html)
the equations() option specifies, by number, the
pairs of equations that are to be compared.
usually, we are estimating just one equation in
each model, in which case the option is
equations(11)

71
4.3 Logit models (xtlogit)

For example
xtlogit y x, fe i(id)
estimates store fixed_effects
xtlogit y x, re i(id)
estimates store random_effects
hausman fixed_effects random_effects
STATA is telling us there is an error (we need to
specify the equation numbers)
hausman fixed_effects random_effects, eq(11)
The Chi2 statistic is negative (again there is a
small sample problem which causes the asymptotic
assumption to fail).

72
Class exercise 4c

Open the fee.dta data set.
Estimate models in which big6 is the dummy
dependent variable using
ordinary logit
fixed-effects logit
random-effects logit
Why is the estimation sample much smaller in the
fixed effects model?
Which of the three models is most preferable?

73
Class exercise 4c

use "C\phd\Fees.dta", clear
gen lntaln(totalassets)
logit big6 lnta, robust cluster(companyid)
xtlogit big6 lnta, fe i(companyid)
estimates store fixed_effects
xtlogit big6 lnta, re i(companyid)
estimates store random_effects
hausman fixed_effects random_effects, eq(11)
The estimation sample is much smaller in the
fixed effects model because the majority of
companies do not switch between Big 6 and Non-Big
6 auditors during the sample period.
The likelihood ratio test of rho 0 indicates
that the random-effects model is preferable to
the ordinary logit.
The Hausman test indicates that the fixed-effects
model is preferable to the random-effects logit.

74
4.3 Probit models (xtprobit)

Recall that there are two commands available when
the dependent variable is binary (ordinary
logit and probit).
There is no command for a fixed-effects probit
model because no-one has yet found a
transformation that will allow the fixed effects
to be washed out.
If you type xtprobit big6 lnta, fe i(companyid)
you will get an error message.
A random-effects probit model is available,
however
xtprobit big6 lnta, re i(companyid)
Just as with the random-effects logit model,
there is a likelihood ratio test that helps us to
choose between the random-effects probit and the
ordinary probit models.
In our data, we can reject the hypothesis that
rho 0, so we may decide not to use an ordinary
probit model.

75
4.4 Other models
76
4.4 Other models

If you look at the STATA manual for panel data
(Cross-Sectional Time-series), you will find
Fixed-effects and random-effects models are
available for count data (xtpoisson and xtnbreg)
We can test which model is preferable using a
Hausman
Random-effects models are available for censored
data (xttobit and xtintreg)
fixed-effects models are not available
therefore there is no need for a Hausman test

77
4.4 Other models

Duration data is, by its very nature, in the form
of panel data.
What about the multinomial and ordered models
that we previously looked at (mlogit, mprobit,
ologit, oprobit)? It appears that STATA does not
have random- or fixed-effects versions of these
models.

78
4.4 Other models

You can use the search command in STATA to find
out if a command is available.
The search command looks through official STATA
commands, frequently asked questions (on the
STATA website), the STATA journal (SJ) and the
STATA technical bulletins (STBs)
The SJ and STBs are where you can sometimes find
commands that will appear in future versions of
STATA
search multinomial logit
We can find the multinomial logit command but
there does not appear to be any command
specifically for the multinomial model with panel
data

79
4.4 Other models

Even if the command you want is not available
from STATA, you may be able to find a STATA user
who has already written the program that you
need.
Statalist (www.stata.com/statalist/) is an email
listserver where over 2,500 Stata users discuss
all things statistical and Stata.
Click on Archives provided by Statacorp and
search the archives

80
4.4 Other models

For example, suppose you want to estimate a
random-effects ordered probit
Typing this into the statalist archive I found
that someone has written a program with this
command (reoprob) www.stata.com/statalist/archive/
2006-02/msg00509.html
The message tells us we can download it to STATA
by typing
findit reoprob

81
4.4 Other models

If you cannot find someone who has already
written the program and if it is a command that
you really do need, you will either have to write
the program yourself or wait for someone else to
do it.
In fact, it is not too difficult to learn how to
write new programs in STATA
you would need to take a STATA programming course
www.stata.com/netcourse/
net courses 151 152

82
Summary

There are three advantages to using panel data
We can control for unobservable fixed effects
that might otherwise bias the coefficient
estimates.
these unobservable fixed effects can be
company-specific, country-specific, or
person-specific.
The larger sample means that the coefficients are
estimated more precisely.
We can include lagged or change variables in our
models.

83
Summary

The xtreg command is used to estimate
fixed-effects and random-effects models (where
the dependent variable is continuous).
We can test whether the fixed-effects or
random-effects model is preferable using the
hausman test.
If there is a significant correlation between ui
and Xit, the fixed effects model is preferable to
the OLS and random effects models.
If there is no significant correlation between ui
and Xit, we can test whether the OLS or
random-effects model is preferable using a LM
test.

84
Summary

When the dependent variable is binary we can
estimate fixed-effects or random-effects logit
models.
Again, we can test which model is preferable
using a Hausman test.
Only the random-effects model is available in the
case of the probit model.

Write a Comment

User Comments (0)

About PowerShow.com

4. Using panel data PowerPoint PPT Presentation