Some Basic Issues in Determining Sample Size for Hypothesis Tests - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Some Basic Issues in Determining Sample Size for Hypothesis Tests

Description:

Some Basic Issues in Determining Sample Size for Hypothesis Tests. The context for the kind of sample size calculations we will be concerned with is: ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 48
Provided by: VHAHOU1
Category:

less

Transcript and Presenter's Notes

Title: Some Basic Issues in Determining Sample Size for Hypothesis Tests


1
Some Basic Issues in Determining Sample Size for
Hypothesis Tests
Peter Richardson, Ph.D. Diana Urbauer, M.S.
2
Context
  • The context for the kind of sample size
    calculations we will be concerned with is
  • null hypothesis significance testing
  • with the following structure
  • a null hypothesis (typically no difference
    between treatments or groups)
  • an alternative hypothesis (specifying some kind
    of difference between treatments or groups)
  • Usually we aim to support the alternative
    hypothesis by showing that the data do not
    support the null hypothesis.

3
Some examples of null and alternative hypothesis
pairs.
example 1 null hypothesis group means
are equal alternative hypothesis group means
are not equal example 2 null hypothesis
group means are equal alternative hypothesis
second group mean is larger than first
4
Some examples of null and alternative hypothesis
pairs.

example 2 (restated) example 3 null
hypothesis no difference in group means
alternative hypothesis second group mean is
larger than first by at least delta
5
Some examples of null and alternative hypothesis
pairs.
example 4 null hypothesis group means are
equal alternative hypothesis group means
differ by at least delta
6
How do we test a hypothesis?
First, we need to specify how data will be
distributed under the two hypotheses. I.e., full
specification of the null and alternative
hypotheses require identification of the
probability distributions of the data under these
hypotheses. In a typical instance of example
4 the null hypothesis would specify that the
difference in group sample means follows a normal
distribution with mean 0.
7
How do we test a hypothesis?
But how would sample means be distributed under
the alternative hypothesis in our example? In
this case, the null hypothesis is a point
hypothesis (it specifies one single value as the
mean difference), whereas the alternative
hypothesis is a composite hypothesis. I.e. this
alternative hypothesis is equivalent to
infinitely many point hypotheses combined by
or. For each of these point hypotheses, we need
to specify a probability distribution for the
difference in group means. In actual
calculations, however, the particular point
distribution(s) at the threshold(s) of the
alternative hypothesis play a special role.
8
How do we test a hypothesis?
Once the distributions of the data under the two
hypotheses have been specified, the second step
is to specify a rejection region for the null
hypothesis. If our datas test statistic falls
into the rejection region, then the null
hypothesis is rejected. We often obtain the null
hypothesis rejection region by specifying a
confidence level for the test. The relationship
between rejection region and confidence level is
illustrated by this example at a 95
confidence level, the probability under the null
hypothesis of the difference of sample means
falling into the rejection region is .05 (i.e. if
the null hypothesis were true, there would be a
5 probability of the null hypothesis being
rejected).
9
confidence level
power so
10
Sample Size
  • the test for example 4
  • involves sampling from the two groups and
    calculating the difference between the two sample
    means. The higher the sample size, the smaller
    the variances in the distributions of the sample
    mean differences for both the null hypothesis and
    the alternative hypothesis.
  • Fixing the confidence level, we observe two
    things with increasing sample size,
  • the rejection region gets larger and
  • the power increases

11
Power equations for example 4
  • if variance sigma is known (and equal for both
    groups)
  • if variances are known and unequal
  • if variances are unknown and equal
  • where the t-distributions have N1N2-1 degrees of
    freedom and t-star is noncentral.

12
lets solve for N
to make it easier, assume a balanced design where
N1N2. Then for with 95 confidence and 80
power, the first case gives
13
Sample Size and Effect Size
14
Power and Effect Size
15
SUMMARY Components of Sample Size Calculations
for Tests of 2 Independent Means
  • Statistical Significance (a) - the probability a
    researcher is willing to accept of rejecting the
    null hypothesis when that hypothesis is true
  • Power (1-ß) - the probability of rejecting the
    null hypothesis when that hypothesis is false
  • Minimum Detectable Difference (?) - the minimum
    detectable difference that the researching is
    looking for often considered the minimum
    clinical difference of interest
  • Variance (s2) - the variance of the phenomenon of
    interest

16
E.g. Effect of Variance Upon Sample Size Holding
Everything Else Constant
17
E.g. Effect of Power Upon Sample Size Holding
Everything Else Constant
18
E.g. Effect of Minimum Detectable Difference
Upon Sample Size Holding Everything Else Constant
19
E.g. Effect of Significance Upon Sample Size
Holding Everything Else Constant
20
Tests of 2 Independent MeansVar(X) known,
Var(X1) Var(X2) , n1n2
  • N per group is

21
Tests of 2 Independent Means Var(X) known,
Var(X1) ltgt Var(X2), n1n2
  • N per group is

22
Test of 2 Independent Means with Unequal Sample
Sizes
  • Now you need to know either the sample size in
    one of your two study groups or the ratio of
    N1N2.
  • Knowing the ratio can give a more efficient
    estimate of sample size

23
Test of 2 Independent Means with Unequal Sample
Sizes
  • For variance known, Var(X1)Var(X2), normal
    distribution

24
Test of 2 Independent Means with Unequal Sample
Sizes
  • For variance known, Var(X1)ltgtVar(X2), normal
    distribution

25
Tests of 2 Independent MeansPoints to Ponder
  • What if data do not follow a normal distribution?
  • What if Var(X1) and Var(X2) are unknown?

26
Tests of 2 Independent MeansPoints to Ponder
(contd)
  • Note that N is in both sides of the equation and
    cannot be solved for except in an iterative
    fashion.

27
Tests of 2 Means Dependence
  • What if means are not independent, e.g. paired
    t-test?
  • Now need an estimate of the correlation between
    X1 and X2
  • The variance is adjusted by 1-r, where r is the
    correlation between X1 and X2

28
Tests of 2 Means Dependence
  • So
  • becomes
  • Note that when two samples are not independent,
    smaller sample sizes will yield just as much
    power, everything else being held constant

29
Power Determination for Analysis of Variance
Models
  • Need to know
  • Significance level (a)
  • The number of groups, t
  • The number of observations per group, r
  • And therefore the degrees of freedom of the
    F-statistics, ?1 and ?2
  • The experimental error variance (s2)
  • The treatment error variance (s2t)

30
Power Calculation for a One-Way ANOVA
  • where F is the MST/MSE and ? is the estimate of
    the non-centrality parameter defined as

31
Sample Size Calculation for a One-Way ANOVA
  • Note that a knowledge of N is needed to solve for
    N because it is a component of the degrees of
    freedom for the F-distribution and of the
    non-centrality parameter.
  • Therefore, solving for N is an iterative process.

32
E.g. Effect of Treatment Mean Square Upon Sample
Size Holding Everything Else Constant
33
E.g. Effect of Error Mean Square Upon Sample
Size Holding Everything Else Constant
34
E.g. Effect of Power Upon Sample Size Holding
Everything Else Constant
35
Power Calculations for Planned Comparisons in a
One-Way ANOVA
  • Procedure is the same as for overall
    determination of power but now
  • The problems are still the same.

36
Detecting a difference between proportions
Heres an example of a null and alternative
hypothesis pair concerning proportions in two
groups. Our assumptions are that the data in
each group consist of binary (0 or 1) variables
which, upon repeated independent trials, follow a
binomial distribution (the binomial variable
being a count of 1s or successes in N trials).
The proportion p of successes in N trials is
then the mean over those trials of the binomial
count variable.
37
Detecting a difference between proportions
The difference in proportions between the two
groups is the mean of the difference of the two
binomial variables. Unfortunately, the
difference of two binomial variables is not
itself a binomial variable. Its distribution is
actually given by a convolution of two binomial
distributions which can be somewhat computation
intensive for large n. But, any binomial
distribution B(n,p) can be approximated, for
large n, by a normal distribution N(np,np(1-p)).
Since differences of normal random variables are
themselves normally distributed, the difference
of two binomials can be (and commonly still is)
given a normal approximation as can the
difference of two proportions.
38
Detecting a difference between proportions
Heres the basic power equation (using normal
approximations) where
followed by these continuity corrections
39
Detecting a difference between proportions
For a balanced design (with N1N2) we can solve
the power equation for N to obtain before the
continuity correction. The highest power curve
occurs approximately where p-bar(1-Delta)/2. Not
e that the formula includes reference to the two
actual proportions p1 and p2 and not just their
difference (as with the continuous variable
examples earlier on). Lets see some of these
plotted.
40
Sample size curves for detecting difference in
proportions
41
Whats different about the proportion case?
As we saw in the plots on the previous page, the
sample size curve for given power and given
difference to detect depends upon the actual
values of the proportion at the alternative
hypothesis threshold. This is different from the
case of detecting differences in means of
continuous variables in which the sample size
needed to detect a difference of a given size at
a given power was the same regardless of the
values of the means for the two groups at the
alternative hypothesis threshold. In this
respect, the proportion case represents the more
typical kind of sample size calculation scenario.
This dependence on the parameter values at the
alternative hypothesis threshold is seen, for
example, in calculations for odds ratios and
hazard ratios. In such situations, we cannot get
away with simply sizing up for an effect size
alone. An odds ratio example follows.
42
Illustrations
Odds Ratio (normal approximation) For a null
hypothesis of p1p2 (odds ratio1) and
alternative hypothesis of log(odds ratio) greater
or equal to Delta-star, the power equation is
Note how it is necessary to supply both p1 and p2
explicitly into this formula (the underlying
assumption here being that p1 and p2 are the
alternative hypothesis values for the proportions
whereas under the null they are both hypothesized
to be equal to their weighted average).
43
Illustrations
Log Rank test for hazard ratios Null hypothesis
lambda1lambda2(HR1) Alternative hypothesis
threshold HR Delta
This formula requires specification of not only
the ratio Delta but also the alternative
hypothesis hazard rates for the two groups (whose
ratio is Delta). Moreover, estimates of censoring
rates e(lambda) must be included for both
hypotheses. R1 and R2 are the relative sizes of
the two groups.
44
Some Common Structure of These Sample Size
Formulae
Despite the diversity of statistical tests our
survey has touched on, most of the sample size
formulae derivable from the power equations weve
seen share a fairly consistent structure which we
can summarize by
Where CCV is the confidence level critical value,
PCV is the power critical value, NHSE is the
standard error under the null hypothesis and AHSE
is the standard error under the alternative
hypothesis threshold.
45
Hierarchical Design and Sample Size
Our survey so far has assumed that in each case,
the sampling unit was the individual subject. By
contrast, a hierarchical or nested design results
if the sampling process involves two or more
levels. A two-level hierarchical design would
result if, for example, a sample of treatment
sites were randomly chosen from a population of
such sites, and then individual subjects were
sampled randomly within those sites. In general,
a hierarchical design will make a higher demand
on sample size than the corresponding
single-level design which ignores all structure
above that of the individual subject. It does so
essentially by inflating the standard error
components of the power and sample size formulae
by a multiplicative factor of (1Site Size
1Intrasite Correlation).
46
Other Approaches to Sample Size Calculation
  • Bayesian approaches
  • Simulation

47
In Conclusion
  • In calculating sample size it is essential
  • to know what statistical test you are sizing up
    for
  • to know specifically what the null and
    alternative hypotheses are for the test
  • to have values or estimates ready for the inputs
    to the calculation for the particular test at
    issue (these always include confidence level,
    power and detectible difference, but may also
    include variances, actual parameter values for
    the alternative hypothesis threshold etc.)
Write a Comment
User Comments (0)
About PowerShow.com