Title: Multilevel Models for Ordered Categorical Variables
1Multilevel Models for Ordered Categorical
Variables
Session 5
Damon Berridge
2Multilevel Models for Ordered Categorical
Variables
- Variables that have as outcomes a small number
of ordered categories are quite common in the
social and biomedical sciences. Examples of such
variables are responses to questionnaire items
(with outcomes, e.g., 'completely disagree',
'disagree', 'agree', 'completely agree'), a test
scored by a teacher as 'fail', 'satisfactory', or
'good', etc. - When the number of categories is two, the
dependent variable is binary. -
- When the number of categories is rather large
(10 or more), it may be possible to approximate
the distribution by a normal distribution and
apply the hierarchical linear model for
continuous outcomes. - The main issue in such a case is the
homoscedasticity assumption is it reasonable to
assume that the variances of the random terms in
the hierarchical linear model are constant? - If in some groups, or for some values of the
explanatory variables, the response variable
assumes outcomes that are very skewed toward the
lower or upper end of the scale, then the
homoscedasticity assumption is likely to be
violated.
3- It is usual to assign numerical values to the
ordered categories, remembering that the values
are arbitrary.
- Values for the ordered categories are defined as
- Let the C ordered response categories be coded as
. - The multilevel ordered models can also be
formulated as threshold models. The real line is
divided into C intervals by the thresholds,
corresponding to the C ordered categories. - The first threshold is g1. Threshold gc defines
the boundary between the intervals corresponding
to observed outcomes c-1 and c (for
). - The latent response variable is denoted by
and the observed categorical variable is
related to by the 'threshold model' defined
as
4The Two-Level Ordered Logit Model
- The ordinal models can be written as
where
In the absence of explanatory variables and
random intercepts, the response variable yij
takes on the values of c with probability
As ordinal response models often utilize
cumulative comparisons of the ordinal outcome,
define the cumulative response probabilities for
the C categories of the ordinal outcome yij as
5 If the cumulative density function of eij is F,
these cumulative probabilities are denoted by
Equivalently, we can write the model as a
cumulative model
- If eij has the logistic distribution, this
results in the multilevel ordered logistic
regression model, also called the multilevel
ordered logit model or multilevel proportional
odds model.
- If eij has the standard normal distribution,
this leads to the multilevel ordered probit model.
6Assuming the distribution of the error term eij
of the latent response to be logistic, the
cumulative probability function of yij will be
written as
The idea of cumulative probabilities leads
naturally to the cumulative logit model
7Level-1 Model
With explanatory variables and random intercepts
the level-1 model becomes
The model is sometimes written as
8Level-2 Model
The level-2 model has the usual form
Note that the model which includes the intercept
parameter g00 and the threshold g1 is not
identifiable. Let us consider a simple intercept
model with no explanatory variables. For the
first category we have
9Dichotomization of Ordered Categories
Models for ordered categorical outcomes are more
complicated to fit and to interpret than models
for dichotomous outcomes. Therefore it can make
sense also to analyze the data after
dichotomizing the outcome variable. For example,
if there are 3 outcomes, one could analyze the
dichotomization 1 versus 2, 3 and also 1, 2
versus 3. Each of these analyses separately is
based, of course, on less information but may be
easier to carry out and to interpret than an
analysis of the original ordinal outcome.
10Likelihood
where
and yijc 1, if yij c, 0 otherwise,
where F(.) is the cumulative distribution
function of eij and
- Sabre evaluates the integral
for the ordered response model using numerical
quadrature (integration).
11Ordered response model Example C4
- Rowan, Raudenbush, and Cheong (1993) analysed
data from a 1990 survey of teachers working in 16
public schools in California and Michigan. The
schools were specifically selected to vary in
terms of size, organizational structure, and
urban versus suburban location. The survey asked
the following question 'if you could go back to
college and start all over again, would you again
choose teaching as a profession?'
Rowan, B., Raudenbush, S., and Cheong, Y. (1993).
Teaching as a non-routine task implications for
the organizational design of schools, Educational
Administration Quarterly, 29(4), 479-500.
Number of observations (rows) 680 Number of
variables (columns) 4 We use a subset of the
data with the followingvariables tcommit the
three-category measure of teacher
commitment taskvar teachers' perception of task
variety, this assesses the extent to which
teachers followed the same teaching routines each
day, performed the same tasks each day, had
something new happening in their job each day,
and liked the variety present in their
work. tcontrol this is a school level variable,
it is a measure of teacher control. This variable
was constructed by aggregating nine-item scale
scores of teachers within a school, it indicates
teacher control over school policy issues such as
student behaviour codes, content of in-service
programs, student grouping, school curriculum,
and text selection and control over classroom
issues such as teaching content and techniques,
and amount of homework assigned. schlid school
identifier
12The response variable tcommit takes on the value
of k 1,2,3 in the absence of explanatory
variables and random intercepts these values
occur with probabilities
13To assess the magnitude of variation among
schools in the absence of explanatory variables,
we specify a simple level-1 model. This model
has only the thresholds and the school specific
intercepts as fixed effrets
The level-2 model is
- Next, we consider the introduction of
explanatory variables into this model.
The level-1 model is
while the level-2 model is
The combined model is
14- For the model parameters without covariates, the
results indicate that the estimated values of
threshold parameters are 0.217 (g1), 1.248
(g2), and that the estimate of the variance of
the school specific intercepts, , is
(0.33527)2 0.11241.
The model formulation summarizes the two
equations as
- For the model with explanatory variables
included, the two equations summarizing these
results are
15The results indicate that, within schools,
taskvar is significantly related to commitment,
(g10 0.349, ztest 3.98) between schools,
tcontrol is also strongly related to commitment,
(g01 1.541 , ztest 4.27). Inclusion of
tcontrol reduced the point estimate of the
between-school variance to 0.000.
This suggests that the model without the random
effect u0j will be