Title: xtreg and xtmixed: recap
1xtreg and xtmixed recap
We have the standard regression model (here with
only one x)
but think that the data are clustered, and that
the intercept (c0) might be different for
different clusters
where the S-variables are dummies per cluster.
Because k can be large, this is not always
feasible to estimate. Instead we estimate
with the delta normally distributed with zero
mean and variance to be estimated.
2And this you can do with xtreg
- xtset ltclustervariablegt
- xtreg y x1
- and by doing this, we are trying to take into
account the fact that the errors are otherwise
not independent. - But, it might be that you want to test also
whether the coefficient of x1 varies across the
clusters.
3What if c1 varies as well?
The same argument applies. We already had
and now make the c1 coefficient dependent on
the cluster (random slopes)
This is not feasible to estimate, so instead we
want to model
with zeta a normally distributed variable with
zero mean and variance to be estimated
4And this you can do with xtmixed
- xtmixed y x1 ltclustervargt
- is just like the xtreg command, but if you want
random slopes for x1, you add x1 after the - xtmixed y x1 ltclustervargt x1
- Your output then gives you estimates for the
variance (or standard deviation) of delta and
zeta.
5(No Transcript)
6xtmixed can deal with nested clusters as well
(classes within schools)
Again the same kind of argument applies. We
already had
and we want separate constant terms per class
and per school
So we estimate instead
where delta is again a normally distributed
variable at the school level with zero mean and
variance to be estimated, and tau is a normally
distributed variable at the class level with zero
mean and variance to be estimated.
7And this you can do with xtmixed as well
- xtmixed y x1 school class
- Remember to put the bigger cluster on the left!
8(No Transcript)
9Horrors
- xtmixed finds its estimates using an iterative
process. - That can complicate matters
- it might not converge
- it might converge but to the wrong values
- it might converge to different estimates for
different algorithms in the iterative process - You have only a couple of weapons against that
- run again using a different algorithm (use option
, mle) - Allow estimation of correlations as well (use
option, cov(unstr)) - run the dummy-variant (with lots of dummies)
anyway -
- In principle, I will try to make sure that such
issues will not occur, but I cannot be sure. This
is also something you can pre-check yourselves.
10Splitting up variables (within vs across clusters)
- Basically this is completely unrelated to the
previous. The important thing is that it can be
done in clustered data, and can lead to different
interpretations (see before) - There are two separate issues
- - which coefficients do you want to vary per
cluster? - - which variables do you want to include?
- Splitting up variables is related to the second
question, not the first.