Title: Bivariate Mixed Discrete and Continuous Responses
1Bivariate Mixed Discrete and Continuous
Responses
2Articles referenced
- R1) Fitzmaurice and Laird (JASA, 1995).
Regression models for a bivariate discrete and
continuous outcome with clustering - R2) Fitzmaurice and Laird (Biometrics, 1997).
Regression models for mixed discrete and
continuous responses with potentially missing
values - Other papers (not referenced) Cox (1972)
Catalano and Ryan (1992)
3National Toxicity Program (NTP) Study
- Ethylene glycol, EG, is a high volume industrial
chemical. - EG (at different doses) was applied to
pregnant lab mice over the period of major
organogenesis, beginning just after implantation - Each live fetus found (after sacrifice) was
examined for evidence of malformations (-- a
discrete response) - Fetal weight (-- a continuous response--) was
also measured - Primary question effects of dose on fetal weight
and malformation
4SimBaby Trial (A study at CHOP, Investigators
Aaron Donoghue and others)
- Randomized Trial
- 2 groups house staff performing mock
resuscitation exercises on a patient simulator
vs. standard manikin - Hypothesis Patient simulator (SimBaby) group
will have improved performance in test scenarios
compared to the manikin group - A list of tasks has to be performed in sequence
by both the groups
5SimBaby Trial - Responses
- Dichotomous variables indicating whether each
task was performed or not - If a task was performed, then the time taken for
that task is also measured - Performance evaluated on both these variables
6SCCOR (A study at CHOP, Investigators Elizabeth
Goldmuntz and others)
- Retrospective case-control study
- Main objective to test whether patients with
22q11 deletion, a genetic mutation, has worse
clinical outcomes than the non-deleted group - A few of the outcomes are bivariate mixed
discrete and continuous in nature
7SCCOR a few specific outcomes
- Cardiac Pulmonary Bypass (CPB) yes/no
- If yes, time taken for each run
- DHCA yes/no
- If yes, time taken for each run
8A brief note on the examples
- There is correlation between the bivariate
responses (since they are both measured on the
same subjects), which needs to be accounted for - Last two examples are qualitatively a bit
different from the first one The time variable
gets switched on only if the dichotomous
variable is yes so its more than just
correlation
9Likelihood representation (as given in R1)
- Xi continuous response, Yi binary response
- Assume (1 x P vector) Zi predicts both Yi and Xi
- The marginal distribution of Yi is Bernoulli,
- f(yiZi) expyi?i log1 exp(?i),
where - ?i logµ1i/(1- µ1i) Ziß1 and
- µ1i E(Yi) Pr(Yi 1/Zi, ß1)
10Likelihood representation (as given in R1)
- The log-likelihood is
-
- where f Xi, Yi (xi, yi) fYi(yi)fXiYi(xiyi
) is the joint density - We assume fXiYi(xiyi) (2ps2) -1/2
- ? is a parameter for the regression of Xi on Yi
11Likelihood representation (as given in R1)
- The continuous variable has a conditional mean
that depends on the binary response. -
- This dependency induces association or
correlation between Yi and Xi. - Also note that so
that both ß1 and ß2 are regression parameter that
have marginal interpretations
12Parameter estimates (as given in R1)
- The parameter estimates for
may be obtained by solving the score
equations - The covariance of the parameter estimates can be
approximated by the inverse of the Fisher
information matrix
-
13Correlated Bivariate Model (as given in R1)
- Extensions of the previous model to allow for
clustering - The responses for the ith cluster consists of
(Xi, Yi), where - Xi (Xi1, Xi2, , Xini)', Yi (Yi1,
Yi2, , Yini)' - Let Zi (zi1, zi2, , zini)' represent the
covariates for the ith cluster
14Correlated Bivariate Model (as given in R1)
- The model for the mean is assumed to be
- where
-
- ?1 ?2 association between binary and
continuous responses made on the same unit within
a cluster - ?2 association between binary and continuous
responses made on different units within a
cluster
15Correlated Bivariate Model (as given in R1)
- Also assumed separate intracluster correlations,
?Y and ?X, respectively for the binary and
continuous responses - GEE methodology is used for the estimation of
(ß1, ß2, ?1, ?2). Method of moments estimators
for s2, ?Y and ?X. - Maximum likelihood estimation is quite
complicated in the clustered data setting
16A closer look at SimBaby and SCCOR examples
- The discrete variable (task performed yes/no)
and the continuous variable (time taken for the
task) is much more than correlated. The
continuous variable gets switched on to a
nonzero value only when the discrete variable is
yes. For these examples, maybe the joint
distribution for discrete and continuous
variables should be reformulated to reflect this. - If the continuous variable is e.g. time taken,
its range is 0, 8). Maybe a gamma distribution
assumption is better than a normal distribution
assumption
17Thank you!