Title: Likelihood Ratio Testing under Nonidentifiability: Theory and Biomedical Applications
1Likelihood Ratio Testing under
Non-identifiability Theory and Biomedical
Applications
- Kung-Yee Liang and Chongzhi Di
- Department Biostatistics
- Johns Hopkins University
- July 9-10, 2009
- National Taiwan University
2Outline
- Challenges associated with likelihood inference
- Nuisance parameters absent under null hypothesis
- Some biomedical examples
- Statistical implications
- Class I alternative representation of LR test
statistic - Implications
- Class II
- Asymptotic null distribution of LR test statistic
- Some alternatives
- A genetic linkage example
- Discussion
3Likelihood Inference
- Likelihood inference has been successful in a
variety of scientific fields - LOD score method for genetic linkage
- BRCA1 for breast cancer
- Hall et al. (1990) Science
- Poisson regression for environmental health
- Fine air particle (PM10) for increased mortality
in total cause and in cardiovascular and
respiratory causes - Samet et al. (2000) NEJM
- ML image reconstruction estimate for nuclear
medicine - Diagnoses for myocardial infarction and cancers
4Challenges for Likelihood Inference
- In the absence of sufficient substantive
knowledge, likelihood function maybe difficult to
fully specify - Genetic linkage for complex traits
- Genome-wide association with thousands of SNPs
- Gene expression data for tumor cells
- There is computational issue as well for
high-dimensional observations - High throughput data
5Challenges for Likelihood Inference (cont)
- Impacts of nuisance parameters
- Inconsistency of MLE with many nuisance
parameters (Neyman-Scott problem) - Different scientific conclusions with different
nuisance parameter values - Ill-behaved likelihood function
- Asymptotic approximation not ready
6(No Transcript)
7(No Transcript)
8Challenges for Likelihood Inference (cont)
- There are situations where some of the
regularity conditions may be violated - Boundary issue (variance components, genetic
linkage, etc.) - Self Liang (1987) JASA
- Discrete parameter space
- Lindsay Roeder (1986) JASA
- Singular information matrix (admixture model)
- Rotnitzky et al (2000) Bernoulli
9Nuisance Parameters Absent under Null
- Ex. I.1 (Polar coordinate for bivariate normal)
- Under H0 d 0, ? is absent
- Davies (1977) Biometrika
- Andrews and Ploberger (1994) Econometrica
10Examples (cont)
- Ex. I.2 (Sterotype model for ordinal categorical
response) - For Y 2, .., C,
- log Pr(Y j)/Pr(Y 1) aj ßjtx, j 2,..,C
- aj fj ßtx
- 0 f1 f2 fC 1
- Under H0 ß 0, fj s are absent
- Anderson (1984) JRSSB
11Examples (cont)
- Ex. I.3 (Variance component models)
- In certain situations, the covariance matrix of
continuous and multivariate observations could be
expressed as - dM(?) ?1M1 ?qMq
- A hypothesis of interest is H0 d 0 (? is
absent) - Ritz and Skovgaard (2005) Biometrika
-
12Examples (cont)
- Ex. I.4 (Gene-gene interactions)
- Consider the following logistic regression model
- logit Pr(Y 1S1, S2) a Sk dkS1k Sj ?kS2j
- ? Sk Sj dk ?j S1k S2j
- To test genetic association between gene one (S1)
by taking into account potential interaction with
gene two (S2), the hypothesis of interest is - H0 d1 dK 0 (? is absent)
- Chatterjee et al. (2005) American Journal of
Human Genetics -
13Examples (cont)
- Ex. II.1 (Admixture models)
- f(y d, ?) d p(y ?) (1 d) p(y ?0)
- d proportion of linked families
- ? recombination fraction (?0 0.5)
- Smith (1963) Annals of Human Genetics
- The null hypothesis of no genetic linkage can be
cast as - H0 d 0 (? is absent) or H0 ? ?0 (d is
absent)
14Examples (cont)
- Ex. II.2 (Change point)
- logit Pr(Y 1x) ß0 ßx d(x ?)
- (x ?) x ? if x ? gt 0 and 0 if otherwise
- Alcohol consumption protective for MI when
consuming less than ?, but harmful when exceeding
the threshold - Pastor et al. (1998) American Journal of
Epidemiology - Hypothesis of no threshold existing can be cast
as - H0 d 0 (? is absent) or H0 ? 8 (d is
absent)
15Examples (cont)
- Ex. II.3 (Non-linear alternative)
- logit Pr(Y 1x) ß0 ßx dh(x ?)
- e.g., h(x ?) exp(x?) 1
- The effect of alcohol consumption on risk,
through log odds, of MI is non-linear if ? ? 0 - The hypothesis of linearity relationship with a
specific non-linear alternative can be cast as - H0 d 0 (? is absent) or H0 ? 0 (d is
absent) - Gallant (1977) JASA
16Characteristics of Examples
- Majority of examples can be characterized as
- f(y, x dhy,x(?, ß), ß)
- Class I Class II
- H0 d 0 H0 d 0 or ? ?0 ( hy,x(?0, ß) 0
) -
-
17Figure expected log likelihood function for
three cases
18Implications
- Under H0, conventional asymptotic results may not
be applied - Likelihood ratio test statistic distribution not
?2 - Difficult to assess evidence against H0 (e.g.
p-value) - Maximum likelihood estimators not normally
distributed - Impact parameter space near H0
- Q. How to deal with this phenomenon?
19Class I Asymptotic
- For H0 d 0,
- 1. LRT 2logL( ) logL(0, )
- sup? 2logL( , ?) logL(0, )
- sup? LRT(?)
- 2. LRT(?) S(?)t I-1(?)S(?) op(1) W2(?)
op(1), - where S(?) ?logL(d, ?)/?dd0, I(?) varS(?)
- W(?) I-1/2(?) S(?)
- and W(?) is a Gaussian process in ? with mean 0,
variance 1 and autocorrelation ?(?1, ?2)
covW(?1), W(?2)
20Class I Asymptotic (cont)
- Results were derived previously by Davies (1977,
Biometrika) - No analytical form available in general
- Approximation, simulation or resampling methods
- Kim and Siegmund (1989) Biometrika
- Zhu and Zhang (2006) JRSSB
- Q. Can simplification be taken place for
- Asymptotic null distribution?
- Approximating the p-value?
21Class I Principal Component Representation
- Principal component decomposition
-
-
- K could be finite or 8
- ?1, ,?K are independent r.v.s
- ?k N(0, ?k), k 1, .., K
- ?(?, ?) Sk ?k ?k(?)2 1
-
22Class I Principal Component Representation
(cont)
- W2(?) Sk ?k ?k(?)2
- Sk ?k2/?k Sk ?k ?k(?)2 Sk ?k2/?k
- Consequently, one has
- sup? W2(?) sup? Sk ?k ?k(?)2 Sk ?k2/?k
- The asymptotic distribution of LRT under Ho is
bounded by -
-
23Class I Simplification
- Simplify to if K lt 8 and for almost every
- (?1, , ?K), there exists ? such that
- ?1?1(?)/?1 .. ?K?K(?)/?K
- Ex. I.1. (Polar coordinate for bivariate normal)
-
- For any , there exists ? e 0,
p) such that -
- LRT instead of
- H0 d 0 ? H0 µ1 µ2
24Class I Simplification (cont)
- Simplify to if S(?) h(?) g(Y)
- ?(?1, ?2) 1
- Ex. Modified admixture models
- ? p(y d) (1 ?) p(y d0), ? e a, 1 with a
gt 0 fixed - H0 d d0 (? is absent) and the score function
for d at d0 is - S(?) ? ?logp(y d0)/?d
- Known as restricted LRT for testing H0 d d0
- Has been used in genetic linkage studies
- Lamdeni and Pons (1993) Biometrics
- Shoukri and Lathrop (1993) Biometrics
25Class I Approximation for P-values
- When simplification fails
- Step 1 Calculate W(?) and ?(?1, ?2)
- Step 2 Estimate eigenvalues ?1, , ?K and
eigenfunctions ?1, , ?K, where K is chosen so
that first K components explain more than 95
variation - Step 3 Choose a set of dense grid ?1, , ?M
and for i 1, , N, repeat the following steps - Simulate ?ik N(0, ?k) for k 1, , K
- Calculate Wi(?m) Sk ?ik ?k(?m)2 for each m
- Find the maximum of Wi(?1), , Wi(?M), Ri say
- R1, , RN approximates the null distribution of
LRT
26Class II Some New Results
- Consider the class of family
- f(y, x dhy,x(?, ß), ß),
- where hy,x(?0, ß) 0 for all y and x
- H0 d 0 or ? ?0
- Tasks 1. Derive asymptotic distribution of LRT
under H0 - 2. Present alternative approaches
- Illustrate through Ex. II.1 (Admixture models)
- d Binom (m, ?) (1 d) Binom (m, ?0)
- d e 0, 1 and ? e 0, 0.5, hy,x(?, ß) p(y
?) p(y ?0) - For simplicity, assuming ß is absent
-
27Class II LRT Representation
- Under H0, f(y, x d, ?) f(y, x 0, ) f(y,
x , ?0 ), - LRT supd,? 2logL(d, ?) logL(0, )
supd,? LRT(d, ?) - max sup1,4 LRT(a), sup2,4 LRT(b), sup3
LRT(a, b), - here for fixed a, b gt 0 and ?0 0.5
- Region 1 d e a, 1, ? e 0.5 b, 0.5
- Region 2 d e 0, a, ? e 0, 0.5 b
- Region 3 d e 0, a, ? e 0.5 b, 0.5
- Region 4 d e a, 1, ? e 0, 0.5 b
28Class II Four Sub-Regions of Parameter Spaces
29Class II Regions 1 4
- With d e a, 1, this reduces to Class I, and
- sup1,4 LRT(a) supd op(1)
- W1(d) I1-½(d)S1(d)
- S1(d) ?logL(d, ?)/??? ?0, I1(d)
var(S1(d)) - For the admixture models,
- S1(d) d ?log p(y ?0)/??, which is
proportional to d - is independent of d
30Class II Regions 2 4
- With ? e 0, ?0 b, this reduces to Class I,
and - sup2,4 LRT(b) sup? op(1)
- W2(?) I2-½(?)S2(?)
- S2(?) ?logL(d, ?)/?dd 0, I2(?)
var(S2(?)) - For the admixture models,
- S2(?) p(y ?) p(y ?0)/p(y ?0)
- W2(?) ? ?log p(y ?0)/?? W1 as ? ? ?0 (or b ?
0) -
31Class II Region 3
- With d e 0, a, ? e 0.5 b, 0.5, expand at
0, 0.5 - sup3 LRT(a, b) supd,? op(1)
- W3 I3-½S3
- S3 ?2logp(y 0, 0.5)/?d??, I3 var(S3)
- For the admixture models,
- S3 ?log p(y ?0)/?? and W3 W1
32Class II Asymptotic Distribution of LRT
- Combining three regions and let a, b ? 0,
- LRT max sup1,4 LRT(a), sup2,4 LRT(b), sup3
LRT(a,b) - max , sup W2(?)2,
- ? sup W2(?)2 ,
- where
- The asymptotic null distribution of LRT is
supremum of squared Gaussian process w.r.t. ? - Simplification (null distribution and
approximation to p-values) can be adopted from
Class I
33Class II Alternatives
- Question Can one find alternatives test
statistics with conventional asymptotic null
distributions? - Restricted LRT limit range for d to a, 1 with
a gt 0 (Region 1 and 4) - TR(a) sup1,4 LRT(a)
- TR(a) decreases in a
- How to choose a?
- Smaller the a, the better
- Chi-square approximation maybe in doubt
34Class II Alternatives (cont)
- 2. Smooth version (penalized LRT)
- Instead of excluding (d, ?) values in Regions 2
3, they are penalized toward d 0 by
considering penalized log-likelihood - PL(d, ? c) log L(d, ?) c g(d),
- where g(d) 0 is a smooth penalty with
, maximized at d0 and c gt 0
controlling the magnitude of penalty - Bayesian interpretation
- g(d) could be prior on d
35Class II Penalized likelihood
- Define the penalized LR test statistic for H0 ?
?0 - PLRT(c) 2 sup PL(d, ? c) PL(d0, ?0 c)
- Under H0 PLRT(c) ? as n ? 8
- Proof is more demanding
- Similar concerns to restricted LRT
- Decreasing in c
- Chi-square approximation maybe invalid with small
c - Lose power if d is small (linked families is
small in proportion) - Different approach for mixture/admixture model
was provided by - by Chen et al. (2001, 2004) and Fu et al. (2006)
36Application Genetic Linkage for Schizophrenia
- Conducted by Ann Pulver at Hopkins
- 486 individuals from 54 multiplex families
- Interested in marker D22S942 in chromosome 22
- Schizophrenia is relatively high in prevalence
with strong evidence of genetic heterogeneity - To take into account of this phenomenon, consider
- d Binom (m, ?) (1 d) Binom (m, ?0),
- where d e 0, 1, ? e 0, 0.5 with ?0 0.5
37Genetic Linkage Study of Schizophrenia (cont)
- With this admixture model considered,
-
- LRT 6.86, p-value 0.007
- PLRT with
- PLRT(3.0) 5.36, p-value 0.010
- PLRT(0.5) 5.49, p-value 0.009
- PLRT(0.01) 6.84, p-value 0.004
- Which p-value do we trust better?
38Figure asymptotic vs empirical distribution of
the LRT for genetic linkage example
39Discussion
- Issue considered, namely, nuisance parameters
absent under null, is common in practice - Examples can be classified into Class I and II
- Class I H0 can only be specified through d ( 0)
- Class II H0 can specified either in d ( 0) or ?
( ?0) - For Class I, asymptotic distribution of LRT is
well known, and through principal component
representation - Deriving sufficient conditions for simple null
distribution - Proposing a means to approximate p-values
40Discussion (cont)
- For Class II, less well developed
- Deriving asymptotic null distribution of LRT
- Through this derivation, we observe
- Connection with Class I
- Connection with RLRT and PLRT
- Proof on asymptotic of PLRT non-trivial
- Shedding light on why penalty applied to d not to
?? - Pointing out some peculiar features and
shortcoming of these two approaches - A genetic linkage example on schizophrenia was
presented for illustration
41Discussion (cont)
- Some future work
- Constructing confidence intervals/regions
- Generalizing to partial/conditional likelihood
- Cox PH model with change point (nuisance
function) - Extending to estimating function approach in the
absence of likelihood function to work with - Linkage study of IBD sharing for affected
sibpairs - E(S(t)) 1 (1 - 2?t,?)2 E(S(?)) 1 (1 -
2?t,?)2 d - ?t,? (1 exp(0.02t ?)/2
42Figure log likelihood and penalized log
likelihood function for genetic linkage example