Multiple requirements for multiple test procedures - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Multiple requirements for multiple test procedures

Description:

Until recently, the usual standard in multiple hypothesis testing was control of ... procedure proposed by van der Laan, Dudoit, and Pollard has this property. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 43
Provided by: julietp
Category:

less

Transcript and Presenter's Notes

Title: Multiple requirements for multiple test procedures


1
Multiple requirements for multiple test procedures
  • Paper presented at 4th International Conference
    on Multiple Comparisons, Shanghai, Aug. 17-19,
    2005
  • Juliet Popper Shaffer

2
Outline
  • Why multiple requirements?
  • Examples of possible additional requirements
  • FDR control with some level of FWER control The
    Newman-Keuls multiple range procedure
  • A comparative example

3
Why multiple requirements?
  • Until recently, the usual standard in multiple
    hypothesis testing was control of the familywise
    error rate (FWER), the probability of no errors,
    greater than or equal to some specified level 1
    a, often set at .95.
  • The other extreme was to test each hypothesis at
    a specified a, ignoring the multiplicity
    problem-the per-comparison error rate (PCER).

4
  • While control of PCER seems too lax, control of
    FWER sometimes seems too restrictive, especially
    when the number of hypotheses being tested is
    large, so other criteria have been introduced.
    However, it seems important to have some level of
    error control, at the very least PCER control.

5
  • A large number of criteria for controlling errors
    in multiple hypothesis testing have been proposed
    in recent years. Some of them, if considered in
    isolation, allow rejections of some hypotheses
    with very little or no evidence against them. It
    seems necessary in such cases to add other
    requirements to prevent declarations of
    significance in the scientific literature that
    are unsupported by evidence.

6
Example Generalized FWER
  • Two recent publications consider a generalized
    FWER (g-FWER van der Laan, Dudoit, and Pollard,
    2004, k-FWER Lehmann and Romano, 2005),
    proposed by Victor (1982) with earlier proofs of
    some Lehmann-Romano results by Hommel and
    Hoffmann (1987).

7
g-FWER or k-FWER
  • Notations differ. Using the Lehmann-Romano
    notation, the k-FWER is the probability of k or
    more rejections, to be controlled at some
    designated level a, i.e. P(number of false
    rejections k) a.
  • Note that this criterion, without additional
    stipulations, allows rejection of k-1 hypotheses
    with no empirical evidence relating to their
    truth or falsity.

8
  • In fact, the augmentation procedure proposed by
    van der Laan, Dudoit, and Pollard has this
    property. A test controlling the FWER is used,
    and k-1 additional rejections are added to those
    resulting from the use of the test.

9
  • Although the authors remark that there is no need
    to reject the additional hypotheses with the
    smallest p-values, they do include that
    requirement.
  • However, even with that additional requirement,
    the p-values still can be arbitrarily large.

10
  • In fact, a number of authors have mentioned the
    need for additional requirements in an informal
    way, but no formal consideration of the issue
    appears to exist.

11
  • For example, Lehmann and Romano assert
    Evidently, one can always reject the hypotheses
    corresponding to the smallest k-1 p-values
    without violating control of the k-FWER.
    However, it seems counterintuitive to consider a
    stepdown procedure whose corresponding ai are
    not monotone decreasing. In addition, automatic
    rejection of k-1 hypotheses, regardless of the
    data, appears at the very least a little too
    optimistic.

12
  • The Lehmann and Romano k-FWER-controlling
    singlestep procedure rejects hypotheses only if
    their associated p-values are sufficiently small.
    The procedure rejects all hypotheses Hi with
    associated pi k a / n, where n is the total
    number of hypotheses tested.
  • Note that to control the k-FWER below a small
    proportion of n, say ß n, each hypothesis would
    be tested at level a ß.

13
  • As a second example, consider the FDP and FDR
  • Letting V (false rejections) and R (total
    rejections), the FDP is the proportion Q V/R
    and the FDR E(Q). We can define FDPR(?) P(Q
    ? ), and consider procedures for which
  • FDPR(?) a, or, alternatively, FDR a.

14
  • FDPR and FDR
  • Both are subject to the same problem-hypotheses
    with arbitrarily large p-values can be rejected.
    For the FDPR, this clearly follows from the
    result for k-FWER, although at least some initial
    rejections must be based on sufficiently small
    p-values.

15
  • For the FDR, it has been remarked that by
    including hypotheses that are known to be false
    and can be expected to be rejected, the
    probability of rejecting true hypotheses can be
    substantially increased.

16
  • In fact, the probability of rejecting some true
    hypotheses can be made arbitrarily high. For
    example, with a .05, if a procedure controlling
    the FWER rejects 19 hypotheses, another
    hypothesis can be rejected with any p-value
    while the FDR is still controlled at level a.

17
Possible Additional Requirements
  • 1. Set maximum p-value for rejection.
  • 2. Adding true hypotheses should not increase
    the probability of rejecting any true hypothesis.
  • 3. Adding false hypotheses should not increase
    the probability of rejecting any true hypothesis.

18
  • 4. Adding any hypotheses, true or false, should
    not increase the probability of rejecting any
    true hypothesis.
  • Some methods in use satisfy all of these
    additional criteria, while some satisfy a subset.
    It seems desirable that some criteria of this
    kind should be required for scientific
    declarations, but exactly what might be wanted
    would depend on the situation.

19
Another approach to combining criteria
  • It would be good to have the strong error control
    of FWER but the power afforded by
    less-restrictive criteria. Although that isnt
    possible, if FWER control is too strict, perhaps
    some modified form of FWER control can be
    combined with less-restrictive approaches.

20
  • As an example of a modified form of FWER control,
    the Newman-Keuls test will be considered. It is
    one of the multiple range procedures that were
    developed to test all pairwise equality
    hypotheses
  • µi µj among a set of treatment means.

21
  • Assume n treatments, with associated means µ1,
    µ2, µn, and normally-distributed sample means
    M1, M2, Mn, each with common variance s. The
    hypotheses of interest are
  • Hij µi µj, i, j 1, , n, i ? j.
  • Any set of equal means will be called a cluster
    (Hartley, 1955).

22
  • The sample means are arranged in order of size,
    M1 M2 Mn. Assume the common variance
    estimate of a sample mean is s.
  • 1. The hypothesis H1n µ1 µn is rejected if
  • (Mn M1) / s q a,n,df, where q a,n,df is the
    a-critical value of the studentized range
    distribution of the maximum range of n means with
    df degrees of freedom.

23
  • The hypothesis µj - µi 0 is rejected, i gt1, j
    lt n, i lt j iff
  • (Mj Mi ) / s q a,(j-i1),df for all j
    j, i i.
  • In other words, every subrange is tested using
    the a-level critical value of the studentized
    range, and a pair-equality hypothesis is rejected
    iff every equality hypothesis involving a
    subrange containing that pair is rejected.

24
The NK controls the within-cluster FWER
  • Hartley (1955) proved that the Newman-Keuls
    controls the FWER within each cluster i.e. the
    probability of rejecting homogeneity of means
    within any cluster is a. In analogy with the
    per-comparison error rate, this could be called
    the per-cluster error rate more restrictive than
    the PCER but less restrictive than the FWER for
    the whole set of treatments.

25
  • Proof of control of per-cluster error rate
  • Consider a cluster with n means. No hypothesis
    of pairwise equality within this cluster will be
    rejected unless the (studentized) range of values
    within this cluster is qa,n,df , which has
    probability a. Therefore the FWER within the
    cluster is a.

26
  • The NK test does not, however, control the FWER
    over all clusters at the nominal value a.
  • To see this, suppose µ2i-1 µ2i, i 1, 2, ,
    n/2, n even, where the equal pairs (clusters)
    are so widely separated that all between-pair
    equality hypotheses are virtually certain to be
    rejected.

27
  • Then each hypothesis H µ2i-1 µ2i will be
    rejected with probability a, so the probability
    of at least one false rejection (the FWER) will
    be close to 1 (1-a)n/2, with exact equality for
    known variance.

28
  • Although the FWER of the NK test is gt a under
    that scenario (widely separated pairs of equal
    means), the FDR is lt a.

29
  • In fact, Ive proved that when there are clusters
    of equal true means of arbitrary size, and the
    clusters are well-separated, in the sense that
    the probability of rejecting all between-cluster
    pair-equality hypotheses is approximately one,
    the FDR is a.

30
  • The proof holds regardless of the distribution of
    the means, assuming equal variances, but requires
    independence of test statistics between clusters,
    so it holds with tests with known variance,
    tests with variance estimated using only the two
    means involved, or approximately with t tests
    with normal means and a common variance estimate
    based on large degrees of freedom.

31
  • I havent been able to prove FDR control in
    general, but have done extensive simulations that
    suggest it is true, at least with
    normally-distributed, homoscedastic sample means.
    Oehlert (2000) similarly noted that his
    extensive simulations support FDR control for the
    NK test.

32
  • The NK test, then, in addition to controlling
    the per-cluster level at a , apparently controls
    the FDR. It can be compared with the
    FDR-controlling test of Yekutieli (2002) for
    pairwise comparisons.

33
  • Yekutieli (2002) proposed an FDR-controlling
    method for testing all pairwise equality
    hypotheses. This test has no control of the FWER
    of any kind, unlike the NK test which controls
    the FWER within each cluster, a stronger
    error-control property than only FDR control,
    that might be desirable in some cases.

34
A comparative example
  • In his paper, Yekutieli has an example comparing
    his procedure for pairwise FDR control to the
    usual Benjamini-Hochberg FDR-controlling
    procedure, which doesnt assure FDR control in
    the pairwise comparison situation. I used this
    example to compare the NK procedure with
    Yekutielis procedure and with several
    FWER-controlling procedures.

35
  • The example is due to Erdman (1946 ), who
    compared the nitrogen content of six groups of
    red clover plants, each group inoculated with
    cultures from different strains of Rhizobium
    bacteria.

36
  • With 6 groups, there are 15 pairwise equality
    hypotheses. Yekutieli shows that the
    Benjamini-Hochberg (1995) procedure, which does
    not guarantee FDR control at a, rejects 11
    hypotheses, while his FDR-controlling
    modification rejects 10 hypotheses.

37
  • The number of hypotheses rejected using several
    procedures are as follows

38
  • Benjamini-Hochberg (p-value based apparently
    controls FDR) 11
  • Yekutieli (p-value based proven control of FDR)
    10
  • Newman-Keuls (mult-range, controls within-cluster
    FWER and apparently controls FDR) 9
  • Welsch (mult-range, controls FWER) 7
  • Bonferroni (p-value based universal FWER
    control) 6

39
  • Thus, in the Erdman example, the NK
    procedure,which controls the FDR and
    within-cluster FWER, is intermediate in number of
    rejections between the FDR-controlling procedure
    of Yekutieli and the FWER-controlling procedures
    of Welsch and Bonferroni.

40
Summary
  • Some proposed error-controlling procedures should
    be supplemented by other controls on error, and
    some formal consideration of desirable controls
    would be useful. Some possibilities have been
    enumerated.

41
Summary (continued)
  • An example is given of a procedure that
    apparently controls the FDR and also has a
    limited type of FWER control (the Newman-Keuls
    multiple range test) that might be desirable in
    some circumstances. It could be considered a
    compromise between an FDR-controlling and an
    FWER-controlling procedure.

42
Finally
  • An example was given in which the Newman-Keuls
    procedure gives results intermediate between an
    FDR-controlling procedure (Yekutieli) and a
    multiple-range FWER-controlling procedure
    (Welsch).
Write a Comment
User Comments (0)
About PowerShow.com