Title: L1b.1
1Lecture 1b Some basic statistical principles II
- Statistical null hypotheses and the meaning of p
- Test statistics
- Statistical errors in hypothesis testing
- Power and effect size
- Statistical null hypotheses problems and caveats
2Statistical null hypotheses
- The default to which you compare your data
- Usually, one sets up the analysis such that if
you reject the null hypothesis, you have a
pattern which is consistent with the biological
prediction - so that in many cases, the null hypothesis
specifies a lack of pattern.
3The meaning of p
- Informal the probability that the null
hypothesis is true - Strictly correct the probability of observing
data as deviant (from the expected results) as
the observed results if in fact the null
hypothesis were true, assuming the data were
properly collected, and all statistical
assumptions are met.
4To reject or not reject?
- The decision to reject or accept the null
hypothesis is based on p. - This requires some agreement (convention) as to
what p value we will consider as significant.
This threshold value is arbitrary!
5Test statistics
- In standard statistical analysis, p is estimated
by reference to the distribution of an
appropriate test statistic. - If we know the distribution of the test
statistic, we can calculate the probability of
getting a test statistic value at least as large
(small) as the calculated value if H0 were true,
i.e., p.
6An example
- Two samples (1, 2) with mean values that differ
by some amount d. - What is the probability p of observing this
difference under H0 that the two means are in
fact equal?
Frequency
7An example (contd)
- If H0 is true, the expected distribution of the
test statistic t is
Probability (p)
t
0
1
2
3
-3
-2
-1
8An example (contd)
- For the two populations, suppose t 2.01
- What is the probability of getting a value at
least this large under H0 that the two means are
in fact equal? - Since p is small, it is unlikely that H0 is true.
- Therefore, reject H0.
9Statistical errors in hypothesis testing
- Two types a true null hypothesis may be
rejected, or a false null hypothesis may be
accepted. - Type I error (a) the probability of rejecting a
true null hypothesis - Type II error (b) the probability of accepting
a false null hypothesis
10Errors in inference
Reality
Conclusion
H0 is true
H0 is false
Accept H0
no error
?
Reject H0
no error
11Errors in inference an example
Reality
Conclusion
No HIV
HIV
Seronegative
5
99
1
Seropositive
95
H0
HA
12One- and two-tailed null hypotheses
1- a
a/2
a/2
- For 2-tailed H0, there are two rejection regions
of size a/2. - For 1-tailed H0 there is one rejection region of
size a.
Probability
1- a
1- a
a
a
t
13Example 2-tailed H0
- No difference in populations
- H0 m1 m2
- Since H0 is 2- tailed, would reject H0 if m1 - m2
gt 0 or m1 - m2 lt 0.
14Example 1-tailed H0
- The average size of individuals in population 1
is greater than population 2 - H0 m1 - m2 ?? 0
- Since H0 is 1- tailed, would reject H0 if m1 - m2
gt 0 only.
15One versus two-tailed hypotheses
Sample 2
Sample 1
Frequency
- 2-tailed hypothesis reject if any non-random
pattern is detected. - 1-tailed hypothesis reject if a specified
directional non-random pattern is detected
- H0 m1 m2 (2-tailed, reject)
- H0 m1 ?? m2 (1-tailed, accept)
16Important note!
- For given directionality, 1- tailed test is
more powerful than 2-tailed - Therefore, always specify the nature of H0 before
your analysis!
a
a/2
Probability
3
2
17Parameters of statistical inference
- Type I error rate (a)
- Power (1 - Type II error rate 1 - b)
- Sample size (N)
- Effect size (d)
- Each of the above is a function of the other
three. Hence, if three are known, so is the
fourth.
18Power
- Power is the probability of rejecting the null
hypothesis when it is false and a specified
alternate null hypothesis is true, i.e. 1- b. - Power can only be calculated when a specific
alternate null hypothesis is specified.
Therefore, power depends on the alternate null
hypothesis. - Powerful tests can detect small differences, weak
tests only large differences.
19Calculating power an example
- Expected distribution of means of samples of 5
housefly wing lengths from normal populations
specified by m as shown above curves and sY
1.74. Centre curve represents null hypothesis,
H0 m 45.5, curves at sides represent
alternative hypotheses, m 37 or m 54.
Vertical lines delimit 5 rejection regions for
the null hypothesis.
H1 m 37
H0 m 45.5
H1 m 54
35 40 45
50 55
20Power contd
- Increases in type II error, b, as alternative
hypothesis, H1, approaches null hypothesis, H0 --
that is, m1 approaches m . Shading represents b.
Vertical lines mark off 5 critical regions
(2.5 in each tail) for the null hypothesis. To
simplify the graph, the alternative distributions
are shown for one tail only.
21Effect size
- Every null hypothesis in any statistical test
implies a value for some population parameter. - E.g. if two sample means are equal, the absolute
value of the difference d between the two
populations is zero
Frequency
X
22Effect size (contd)
Frequency
- More generally, since H0 specifies a lack of some
phenomenon, d quantifies the degree to which the
phenomenon is present. - So if H0 is false, it is false to some specific
degree, quantified by d, the effect size.
X
23Types of power analysis I power as a function of
a, d and N
- Often done after a statistical test, where N
(sample size) and effect size (d) are determined
and the null hypothesis has been accepted. - Then, for specified a, we can calculate 1- b
(the power of the test) - If 1- b is low, then the Type II error rate is
large, so there is a good chance we have accepted
a false H0.
Frequency
X
24Types of power analysis II N as a function of a,
d and power
- A certain effect size (d) is anticipated (perhaps
based on a preliminary sample) with a desired a
and 1- b. - Given a, b and d, we can calculate the minimum
sample size Nmin required to achieve the desired
specifications. - This exercise can be very useful in planning
experiments.
Frequency
X
25Types of power analysis III d as a function of
a, N and power
- Given a desired a, 1- b and N, what is the
minimal detectable effect size dmin? - If dmin is large, then only large deviations from
H0 will be detected (i.e. will result in
rejection of H0). - Thus, we should be VERY VERY careful NOT to infer
that some phenomenon does not exist if we accept
H0.
Frequency
X
26Power dependence on sample size
- Power curves for testing H0 m 45.5. H1 m ?
45.5 for n 5 and for n 35. - For given observed wing length, the probability
of rejecting a false null hypothesis decreases as
N decreases.
27Why power matters
N 200
Frequency
- Two samples, identical means and variances, but
differ in N - in first case, power is large, p lt .05, therefore
reject H0 - in second case, power is low, p gt .05, therefore
accept H0.
m1
m2
Size
N 30
Frequency
m1
m2
Size
28Power conclusions
- If sample sizes are small, the power of any test
is usually low. - So, unless one knows the power of the analysis, a
decision to accept the null hypothesis is
meaningless! - Conversely, if power is very high, rejection of
the null is very likely, even if deviations from
null expectations are small (and perhaps
biologically meaningless)!
29Statistical hypothesis testing problems and
caveats
- Problem 1 many H0s are very unlikely to be true
a priori - so that their rejection is not very informative.
Treatment 1
Treatment 2
Control
Average yield
Treatment
30Statistical hypothesis testing problems and
caveats
- Problem 2 Nominal type I error (e.g. a 0.5) is
entirely arbitrary, and may not bear any
relationship to biological significance - and even less to decision-making
Threshold for decision-making
Probabilty
-3
-2
-1
0
1
3
2
t
31Statistical hypothesis testing problems and
caveats
- Problem 3 p is probability of obtaining a test
statistic at least as extreme as that observed if
H0 is true - but often the actual (sampling) distribution of
the test statistic does not match the (assumed)
distribution under the null.
Sampled
Probabilty
Null
-3
-2
-1
0
1
3
2
t
32Statistical hypothesis testing problems and
caveats
- Problem 4 for fixed effect size, p depends on
sample size (n) - so that one can almost always reject H0 if the
sample is sufficiently large, even if the
observed effect is trivial
Larger effect size
Type I error
Smaller effect size
0.05
Sample size (n)
33Statistical hypothesis testing problems and
caveats
- Problem 5 since p depends on sample size (n)
- using a fixed nominal a (e.g. a 0.05) as n
increases is logically inconsistent even for n
infinity and true H0, a 0.05!
Fixed a (e.g. 0.05)
0.05
a depends on n
Nominal type I error (a)
0
Sample size (n)
34Statistical hypothesis testing solutions
- Avoid testing trivial null hypotheses
- Distinguish between biological (or other)
significance and statistical significance - Always provide estimates of effect sizes and
their precision, statistical significance (or
lack thereof) notwithstanding - Consider using randomization and/or resampling
methods to generate actual distribution of test
statistics.