Likelihood Ratio Testing under Nonidentifiability: Theory and Biomedical Applications PowerPoint PPT Presentation

presentation player overlay
1 / 42
About This Presentation
Transcript and Presenter's Notes

Title: Likelihood Ratio Testing under Nonidentifiability: Theory and Biomedical Applications


1
Likelihood Ratio Testing under
Non-identifiability Theory and Biomedical
Applications
  • Kung-Yee Liang and Chongzhi Di
  • Department Biostatistics
  • Johns Hopkins University
  • July 9-10, 2009
  • National Taiwan University

2
Outline
  • Challenges associated with likelihood inference
  • Nuisance parameters absent under null hypothesis
  • Some biomedical examples
  • Statistical implications
  • Class I alternative representation of LR test
    statistic
  • Implications
  • Class II
  • Asymptotic null distribution of LR test statistic
  • Some alternatives
  • A genetic linkage example
  • Discussion

3
Likelihood Inference
  • Likelihood inference has been successful in a
    variety of scientific fields
  • LOD score method for genetic linkage
  • BRCA1 for breast cancer
  • Hall et al. (1990) Science
  • Poisson regression for environmental health
  • Fine air particle (PM10) for increased mortality
    in total cause and in cardiovascular and
    respiratory causes
  • Samet et al. (2000) NEJM
  • ML image reconstruction estimate for nuclear
    medicine
  • Diagnoses for myocardial infarction and cancers

4
Challenges for Likelihood Inference
  • In the absence of sufficient substantive
    knowledge, likelihood function maybe difficult to
    fully specify
  • Genetic linkage for complex traits
  • Genome-wide association with thousands of SNPs
  • Gene expression data for tumor cells
  • There is computational issue as well for
    high-dimensional observations
  • High throughput data

5
Challenges for Likelihood Inference (cont)
  • Impacts of nuisance parameters
  • Inconsistency of MLE with many nuisance
    parameters (Neyman-Scott problem)
  • Different scientific conclusions with different
    nuisance parameter values
  • Ill-behaved likelihood function
  • Asymptotic approximation not ready

6
(No Transcript)
7
(No Transcript)
8
Challenges for Likelihood Inference (cont)
  • There are situations where some of the
    regularity conditions may be violated
  • Boundary issue (variance components, genetic
    linkage, etc.)
  • Self Liang (1987) JASA
  • Discrete parameter space
  • Lindsay Roeder (1986) JASA
  • Singular information matrix (admixture model)
  • Rotnitzky et al (2000) Bernoulli

9
Nuisance Parameters Absent under Null
  • Ex. I.1 (Polar coordinate for bivariate normal)
  • Under H0 d 0, ? is absent
  • Davies (1977) Biometrika
  • Andrews and Ploberger (1994) Econometrica

10
Examples (cont)
  • Ex. I.2 (Sterotype model for ordinal categorical
    response)
  • For Y 2, .., C,
  • log Pr(Y j)/Pr(Y 1) aj ßjtx, j 2,..,C
  • aj fj ßtx
  • 0 f1 f2 fC 1
  • Under H0 ß 0, fj s are absent
  • Anderson (1984) JRSSB

11
Examples (cont)
  • Ex. I.3 (Variance component models)
  • In certain situations, the covariance matrix of
    continuous and multivariate observations could be
    expressed as
  • dM(?) ?1M1 ?qMq
  • A hypothesis of interest is H0 d 0 (? is
    absent)
  • Ritz and Skovgaard (2005) Biometrika

12
Examples (cont)
  • Ex. I.4 (Gene-gene interactions)
  • Consider the following logistic regression model
  • logit Pr(Y 1S1, S2) a Sk dkS1k Sj ?kS2j
  • ? Sk Sj dk ?j S1k S2j
  • To test genetic association between gene one (S1)
    by taking into account potential interaction with
    gene two (S2), the hypothesis of interest is
  • H0 d1 dK 0 (? is absent)
  • Chatterjee et al. (2005) American Journal of
    Human Genetics

13
Examples (cont)
  • Ex. II.1 (Admixture models)
  • f(y d, ?) d p(y ?) (1 d) p(y ?0)
  • d proportion of linked families
  • ? recombination fraction (?0 0.5)
  • Smith (1963) Annals of Human Genetics
  • The null hypothesis of no genetic linkage can be
    cast as
  • H0 d 0 (? is absent) or H0 ? ?0 (d is
    absent)

14
Examples (cont)
  • Ex. II.2 (Change point)
  • logit Pr(Y 1x) ß0 ßx d(x ?)
  • (x ?) x ? if x ? gt 0 and 0 if otherwise
  • Alcohol consumption protective for MI when
    consuming less than ?, but harmful when exceeding
    the threshold
  • Pastor et al. (1998) American Journal of
    Epidemiology
  • Hypothesis of no threshold existing can be cast
    as
  • H0 d 0 (? is absent) or H0 ? 8 (d is
    absent)

15
Examples (cont)
  • Ex. II.3 (Non-linear alternative)
  • logit Pr(Y 1x) ß0 ßx dh(x ?)
  • e.g., h(x ?) exp(x?) 1
  • The effect of alcohol consumption on risk,
    through log odds, of MI is non-linear if ? ? 0
  • The hypothesis of linearity relationship with a
    specific non-linear alternative can be cast as
  • H0 d 0 (? is absent) or H0 ? 0 (d is
    absent)
  • Gallant (1977) JASA

16
Characteristics of Examples
  • Majority of examples can be characterized as
  • f(y, x dhy,x(?, ß), ß)
  • Class I Class II
  • H0 d 0 H0 d 0 or ? ?0 ( hy,x(?0, ß) 0
    )

17
Figure expected log likelihood function for
three cases
18
Implications
  • Under H0, conventional asymptotic results may not
    be applied
  • Likelihood ratio test statistic distribution not
    ?2
  • Difficult to assess evidence against H0 (e.g.
    p-value)
  • Maximum likelihood estimators not normally
    distributed
  • Impact parameter space near H0
  • Q. How to deal with this phenomenon?

19
Class I Asymptotic
  • For H0 d 0,
  • 1. LRT 2logL( ) logL(0, )
  • sup? 2logL( , ?) logL(0, )
  • sup? LRT(?)
  • 2. LRT(?) S(?)t I-1(?)S(?) op(1) W2(?)
    op(1),
  • where S(?) ?logL(d, ?)/?dd0, I(?) varS(?)
  • W(?) I-1/2(?) S(?)
  • and W(?) is a Gaussian process in ? with mean 0,
    variance 1 and autocorrelation ?(?1, ?2)
    covW(?1), W(?2)

20
Class I Asymptotic (cont)
  • Results were derived previously by Davies (1977,
    Biometrika)
  • No analytical form available in general
  • Approximation, simulation or resampling methods
  • Kim and Siegmund (1989) Biometrika
  • Zhu and Zhang (2006) JRSSB
  • Q. Can simplification be taken place for
  • Asymptotic null distribution?
  • Approximating the p-value?

21
Class I Principal Component Representation
  • Principal component decomposition
  • K could be finite or 8
  • ?1, ,?K are independent r.v.s
  • ?k N(0, ?k), k 1, .., K
  • ?(?, ?) Sk ?k ?k(?)2 1

22
Class I Principal Component Representation
(cont)
  • W2(?) Sk ?k ?k(?)2
  • Sk ?k2/?k Sk ?k ?k(?)2 Sk ?k2/?k
  • Consequently, one has
  • sup? W2(?) sup? Sk ?k ?k(?)2 Sk ?k2/?k
  • The asymptotic distribution of LRT under Ho is
    bounded by

23
Class I Simplification
  • Simplify to if K lt 8 and for almost every
  • (?1, , ?K), there exists ? such that
  • ?1?1(?)/?1 .. ?K?K(?)/?K
  • Ex. I.1. (Polar coordinate for bivariate normal)
  • For any , there exists ? e 0,
    p) such that
  • LRT instead of
  • H0 d 0 ? H0 µ1 µ2

24
Class I Simplification (cont)
  • Simplify to if S(?) h(?) g(Y)
  • ?(?1, ?2) 1
  • Ex. Modified admixture models
  • ? p(y d) (1 ?) p(y d0), ? e a, 1 with a
    gt 0 fixed
  • H0 d d0 (? is absent) and the score function
    for d at d0 is
  • S(?) ? ?logp(y d0)/?d
  • Known as restricted LRT for testing H0 d d0
  • Has been used in genetic linkage studies
  • Lamdeni and Pons (1993) Biometrics
  • Shoukri and Lathrop (1993) Biometrics

25
Class I Approximation for P-values
  • When simplification fails
  • Step 1 Calculate W(?) and ?(?1, ?2)
  • Step 2 Estimate eigenvalues ?1, , ?K and
    eigenfunctions ?1, , ?K, where K is chosen so
    that first K components explain more than 95
    variation
  • Step 3 Choose a set of dense grid ?1, , ?M
    and for i 1, , N, repeat the following steps
  • Simulate ?ik N(0, ?k) for k 1, , K
  • Calculate Wi(?m) Sk ?ik ?k(?m)2 for each m
  • Find the maximum of Wi(?1), , Wi(?M), Ri say
  • R1, , RN approximates the null distribution of
    LRT

26
Class II Some New Results
  • Consider the class of family
  • f(y, x dhy,x(?, ß), ß),
  • where hy,x(?0, ß) 0 for all y and x
  • H0 d 0 or ? ?0
  • Tasks 1. Derive asymptotic distribution of LRT
    under H0
  • 2. Present alternative approaches
  • Illustrate through Ex. II.1 (Admixture models)
  • d Binom (m, ?) (1 d) Binom (m, ?0)
  • d e 0, 1 and ? e 0, 0.5, hy,x(?, ß) p(y
    ?) p(y ?0)
  • For simplicity, assuming ß is absent

27
Class II LRT Representation
  • Under H0, f(y, x d, ?) f(y, x 0, ) f(y,
    x , ?0 ),
  • LRT supd,? 2logL(d, ?) logL(0, )
    supd,? LRT(d, ?)
  • max sup1,4 LRT(a), sup2,4 LRT(b), sup3
    LRT(a, b),
  • here for fixed a, b gt 0 and ?0 0.5
  • Region 1 d e a, 1, ? e 0.5 b, 0.5
  • Region 2 d e 0, a, ? e 0, 0.5 b
  • Region 3 d e 0, a, ? e 0.5 b, 0.5
  • Region 4 d e a, 1, ? e 0, 0.5 b

28
Class II Four Sub-Regions of Parameter Spaces
29
Class II Regions 1 4
  • With d e a, 1, this reduces to Class I, and
  • sup1,4 LRT(a) supd op(1)
  • W1(d) I1-½(d)S1(d)
  • S1(d) ?logL(d, ?)/??? ?0, I1(d)
    var(S1(d))
  • For the admixture models,
  • S1(d) d ?log p(y ?0)/??, which is
    proportional to d
  • is independent of d

30
Class II Regions 2 4
  • With ? e 0, ?0 b, this reduces to Class I,
    and
  • sup2,4 LRT(b) sup? op(1)
  • W2(?) I2-½(?)S2(?)
  • S2(?) ?logL(d, ?)/?dd 0, I2(?)
    var(S2(?))
  • For the admixture models,
  • S2(?) p(y ?) p(y ?0)/p(y ?0)
  • W2(?) ? ?log p(y ?0)/?? W1 as ? ? ?0 (or b ?
    0)

31
Class II Region 3
  • With d e 0, a, ? e 0.5 b, 0.5, expand at
    0, 0.5
  • sup3 LRT(a, b) supd,? op(1)
  • W3 I3-½S3
  • S3 ?2logp(y 0, 0.5)/?d??, I3 var(S3)
  • For the admixture models,
  • S3 ?log p(y ?0)/?? and W3 W1

32
Class II Asymptotic Distribution of LRT
  • Combining three regions and let a, b ? 0,
  • LRT max sup1,4 LRT(a), sup2,4 LRT(b), sup3
    LRT(a,b)
  • max , sup W2(?)2,
  • ? sup W2(?)2 ,
  • where
  • The asymptotic null distribution of LRT is
    supremum of squared Gaussian process w.r.t. ?
  • Simplification (null distribution and
    approximation to p-values) can be adopted from
    Class I

33
Class II Alternatives
  • Question Can one find alternatives test
    statistics with conventional asymptotic null
    distributions?
  • Restricted LRT limit range for d to a, 1 with
    a gt 0 (Region 1 and 4)
  • TR(a) sup1,4 LRT(a)
  • TR(a) decreases in a
  • How to choose a?
  • Smaller the a, the better
  • Chi-square approximation maybe in doubt

34
Class II Alternatives (cont)
  • 2. Smooth version (penalized LRT)
  • Instead of excluding (d, ?) values in Regions 2
    3, they are penalized toward d 0 by
    considering penalized log-likelihood
  • PL(d, ? c) log L(d, ?) c g(d),
  • where g(d) 0 is a smooth penalty with
    , maximized at d0 and c gt 0
    controlling the magnitude of penalty
  • Bayesian interpretation
  • g(d) could be prior on d

35
Class II Penalized likelihood
  • Define the penalized LR test statistic for H0 ?
    ?0
  • PLRT(c) 2 sup PL(d, ? c) PL(d0, ?0 c)
  • Under H0 PLRT(c) ? as n ? 8
  • Proof is more demanding
  • Similar concerns to restricted LRT
  • Decreasing in c
  • Chi-square approximation maybe invalid with small
    c
  • Lose power if d is small (linked families is
    small in proportion)
  • Different approach for mixture/admixture model
    was provided by
  • by Chen et al. (2001, 2004) and Fu et al. (2006)

36
Application Genetic Linkage for Schizophrenia
  • Conducted by Ann Pulver at Hopkins
  • 486 individuals from 54 multiplex families
  • Interested in marker D22S942 in chromosome 22
  • Schizophrenia is relatively high in prevalence
    with strong evidence of genetic heterogeneity
  • To take into account of this phenomenon, consider
  • d Binom (m, ?) (1 d) Binom (m, ?0),
  • where d e 0, 1, ? e 0, 0.5 with ?0 0.5

37
Genetic Linkage Study of Schizophrenia (cont)
  • With this admixture model considered,
  • LRT 6.86, p-value 0.007
  • PLRT with
  • PLRT(3.0) 5.36, p-value 0.010
  • PLRT(0.5) 5.49, p-value 0.009
  • PLRT(0.01) 6.84, p-value 0.004
  • Which p-value do we trust better?

38
Figure asymptotic vs empirical distribution of
the LRT for genetic linkage example
39
Discussion
  • Issue considered, namely, nuisance parameters
    absent under null, is common in practice
  • Examples can be classified into Class I and II
  • Class I H0 can only be specified through d ( 0)
  • Class II H0 can specified either in d ( 0) or ?
    ( ?0)
  • For Class I, asymptotic distribution of LRT is
    well known, and through principal component
    representation
  • Deriving sufficient conditions for simple null
    distribution
  • Proposing a means to approximate p-values

40
Discussion (cont)
  • For Class II, less well developed
  • Deriving asymptotic null distribution of LRT
  • Through this derivation, we observe
  • Connection with Class I
  • Connection with RLRT and PLRT
  • Proof on asymptotic of PLRT non-trivial
  • Shedding light on why penalty applied to d not to
    ??
  • Pointing out some peculiar features and
    shortcoming of these two approaches
  • A genetic linkage example on schizophrenia was
    presented for illustration

41
Discussion (cont)
  • Some future work
  • Constructing confidence intervals/regions
  • Generalizing to partial/conditional likelihood
  • Cox PH model with change point (nuisance
    function)
  • Extending to estimating function approach in the
    absence of likelihood function to work with
  • Linkage study of IBD sharing for affected
    sibpairs
  • E(S(t)) 1 (1 - 2?t,?)2 E(S(?)) 1 (1 -
    2?t,?)2 d
  • ?t,? (1 exp(0.02t ?)/2

42
Figure log likelihood and penalized log
likelihood function for genetic linkage example
Write a Comment
User Comments (0)
About PowerShow.com