Title: ESTIMATION
1ESTIMATION HYPOTHESIS TESTING
Dr Liddy Goyder Dr Stephen Walters
2- At the end of session, you should know about
- The process of setting and testing statistical
hypotheses - At the end of session, you should be able to
- Explain
- Null hypothesis
- P-value, and what different values mean
- Type I error
- Type II error
- Understand what is meant by the term Power
- Demonstrate awareness that the p-value does not
give the probability of the null hypothesis being
true - Demonstrate awareness that pgt0.05 does not mean
that we accept the null hypothesis - Distinguish between statistical significance
and clinical significance
3Teenage Pregnancy
- Our young doctor has noticed that there are
differences between the teenage pregnancy rates
in the two general practices that she has worked
in. - The two practice populations are very different
in terms of deprivation - She is interested in investigating whether there
is a statistically significant relationship
between deprivation and teenage pregnancy?
4Teenage Pregnancy Example
- What is the research question (what is being
investigated)? - Is there a relationship between teenage pregnancy
change and deprivation? - What is the outcome variable (how will they
measure this)? - Teenage pregnancy rate
5Statistical Analysis (1)
- Last session we discussed why we take samples
rather than study the whole population - We examine the behaviour of a sample as it is
often not feasible to look at the entire
population - From a sample we want to make inferences about
the population from which it is drawn. - We do this by a process of statistical hypothesis
testing formulating a hypothesis and testing it - This session we will look at how you formulate
and test a hypothesis. - You are not expected to know about individual
tests, but need to understand the concept of
setting and testing statistical hypotheses
6Statistical Analysis (2) Population and Sample
7Statistical Analysis (3)
- The main aim of statistical analysis is to use
the information gained from a sample of
individuals to make inferences about the
population of interest - There are two basic approaches to statistical
analysis - Estimation (confidence intervals)
- Hypothesis testing (p-values)
8Hypothesis testing the main steps
Set null hypothesis
Set study (alternative) hypothesis
Carry out significance test
Obtain test statistic
Compare test statistic to hypothesized critical
value
Obtain p-value
Make a decision
9State your hypotheses (H0 H1 )
- State your null hypothesis (H0)
- (statement you are looking for evidence to
disprove) - State your study (alternative) hypothesis (H1 or
HA) - Often statistical analyses involve comparisons
between different treatments (eg standard and
new) - we assume the treatment effects are equal until
proven otherwise - Therefore the null hypothesis is usually the
negation of the research hypothesis new
treatment will differ in effect from the standard
treatment - NB It is easier to disprove things than prove
them
10Teenage pregnancy example
Is there a relationship between teenage pregnancy
rate and deprivation Teenage pregnancy
rate There is no relationship between teenage
pregnancy rate and deprivation There is a
difference
- What is the research question?
- What is the outcome variable?
- What is the null hypothesis?
- What is the alternative hypothesis?
11Teenage pregnancy example
Ref www.empho.org.uk/whatsnew/teenage-pregnancy-p
resentation.ppt
12Carry out significance test
- Calculate a test statistic using your data
(reduce your data down to a single value). The
general formula for a test statistic is - test statistic observed value-hypothesized
value - se of the hypothesized value
- Compare this test statistic to a hypothesized
critical value (using a distribution we expect if
the null hypothesis is true (e.g. Normal
distribution)) to obtain a p-value
13Teenage pregnancy example
14Teenage Pregnancy example
- We can quantify the relationship using a
regression analysis - This measures what the average change in the
teenage pregnancy rates is for a given change in
the deprivation score - The null hypothesis is that there is no change in
the teenage pregnancy rate as the deprivation
rate changes - The alternative hypothesis is that the teenage
pregnancy rate does change as deprivation changes
15Teenage pregnancy results
Coefficient value P-value for result
Regression coefficient 0.006 per 1,000 women aged 15-17 years lt 0.001
- Thus as the deprivation score increases by 1 unit
there are an additional 0.006 pregnancies per
1,000 women aged 15-17 years. - As deprivation score varies between about 1,000
and 8,000 the above expression can be rescaled - Thus as deprivation score increases by 1,000
units there are an additional 6 pregnancies per
1,000 women - A significance test for the regression
coefficient gives also p-values of less than 0.001
16Making a decision (1)
- When making a decision you can either decide to
reject the null hypothesis or not reject the null
hypothesis. - Whatever you decide, you may have chosen
correctly and - rejected the null hypothesis, when in fact it is
false - not rejected the null hypothesis, when in fact it
is true - Or you may have chosen incorrectly and
- rejected the null hypothesis, when in fact it is
true (false positive) - not rejected the null hypothesis, when in fact it
is false (false negative)
17Making a decision (2)
18Making a decision (3)
19Making a decision (4)
The probability of rejecting the null hypothesis
when it is actually false is called the POWER of
the study (Power1-ß). It is the probability of
concluding that there is a difference, when a
difference truly exists
20Making a decision (5)
The probability of rejecting the null hypothesis
when it is actually false is called the POWER of
the study (Power1-ß). It is the probability of
concluding that there is a difference, when a
difference truly exists
21Making a decision (6)
The probability of rejecting the null hypothesis
when it is actually false is called the POWER of
the study (Power1-ß). It is the probability of
concluding that there is a difference, when a
difference truly exists
A p-value is the probability of obtaining your
results or results more extreme, if the null
hypothesis is true. It is the probability of
committing a false positive error i.e. of
rejecting the null hypothesis when in fact it is
true
22Making a decision (7)
- Use your p-value to make a decision about whether
to reject, or not reject your null hypothesis
- A p-value can range from 0 to 1
- But how small is small? The significance level is
usually set at 0.05. Thus if the p-value is less
than this value we reject the null hypothesis
23Statistical significance (1)
We say that our results are statistically
significant if the p-value is less than the
significance level (?) set at 5
We cannot say that the null hypothesis is true,
only that there is not enough evidence to reject
it
24Statistical significance (2)
- The significance level is usually set at 5
- The level is conventional rather than fixed
- Sometimes, for stronger proof we require a
significance level of 1 (or Plt0.01)
25Misinterpretation of P-values (1)
- A common misinterpretation of the P-value is that
it is - The probability of the data having arisen by
chance - The probability that the observed effect is not a
real one - The distinction between this incorrect definition
and the true definition is the absence of the
phrase when the null hypothesis is true
26Misinterpretation of P-values (2)
- The omission of when the null hypothesis is true
leads to the incorrect belief that it is possible
to evaluate the probability of the observed
effect being a real one - The observed effect in the sample is genuine, but
we do not know what is true in the population - All we can do with this approach to statistical
analysis is to calculate the probability of
observing our data (or data more extreme) when
the null hypothesis is true
27Teenage pregnancy making a decision
Coefficient value P-value for result
Regression coefficient 0.006 per 1,000 women aged 15-17 years lt 0.001
- A p-value is the probability of obtaining your
results or results more extreme, if the null
hypothesis is true - The P-value for the regression coefficient is lt
0.001 - Thus we reject the null hypothesis and conclude
that there is statistically significant change in
teenage pregnancy rates as deprivation rate
changes. - The result is statistically significant at the 5
level
28Teenage pregnancy example making a decision
- If however the P-value had been greater than 0.05
we would have concluded that there is
insufficient evidence to reject the null
hypothesis - The results would not be statistically
significant at the 5 level - We do not conclude that the null hypothesis is
true, only that there is insufficient evidence to
reject it
29Recap making a decision
Set study hypothesis
Set null hypothesis
Carry out significance test
Obtain test statistic
Compare test statistic to hypothesized critical
value
Obtain p-value
Make a decision
30Limitations of a hypothesis test
- All that we know from a hypothesis test is how
likely the difference we observed is given that
the null hypothesis is true - The results of a significance test do not tell us
what the difference is or how large the
difference is - To answer this we need to supplement the
hypothesis test with a confidence interval which
will give us a range of values in which we are
confident the true population mean difference
will lie
31Statistical Clinical Significance (1)
- A clinically significant difference is one that
is big enough to make a worthwhile difference - Statistical significance does not necessarily
mean the result is clinically significant - Supplementing the hypothesis test with an
estimate of the effect with a confidence interval
will indicate the magnitude of the result. This
will help the investigators to decide whether the
difference is of interest clinically
32Statistical Clinical Significance (2)
33Statistical Clinical Significance (2)
34Statistical Clinical Significance (2)
35Statistical Clinical Significance (2)
36Statistical Clinical Significance (2)
37Statistical Clinical Significance (2)
38Statistical Clinical Significance (3)95
Confidence intervals added
39Statistical and clinical significance (4)
- With a large enough sample the smallest of
changes may be statistically significant but not
clinically important. - If the sample size of the study is too small and
has low power, a clinically significant result
may not be regarded as statistically significant. - Therefore it is important that the size of the
sample is adequate to detect the clinically
significant result, at the 5 significance level
with at least 80 power (something to look for in
the methods section when reading the literature).
40Relationship between confidence intervals and
statistical significance (1)
- There is a close relationship between hypothesis
testing and confidence intervals - If the 95 CI does not include zero (or more
generally the value specified in the null
hypothesis) then a hypothesis test will return a
statistically significant result - If the 95 CI does include zero then the
hypothesis test will return a non-significant
result
41Relationship between confidence intervals and
statistical significance (2)
- 95 certain that the CI includes the true value
- Thus there is a 5 probably that the true value
lies outside the CI - If the CI does not include zero there is a less
than 5 probability that the true vale is zero - The p-value represents the probability that you
conclude there is a difference when in fact there
is no difference - Thus when p0.05 there is a 5 probability that
we conclude there is a difference when in fact
there is no difference i.e. there is 5
probability that the true value is zero
42Relationship between confidence intervals and
statistical significance (3)
- The CI shows the most likely size of the
difference given the data and the uncertainty or
lack of precision around this difference. The
p-value alone tells you nothing about the size
nor its precision. Thus the CI conveys more
useful information than p-values alone - eg whether a clinician will use a new treatment
that reduces blood pressure will depend on the
amount of that reduction and how consistent the
effect is - So, the presentation of both the p-value and the
confidence interval is desirable
43Summary
- Research questions need to be turned into a
statement for which we can find evidence to
disprove - the null hypothesis. - The study data is reduced down to a single
probability - the probability of observing our
result, or one more extreme, if the null
hypothesis is true (P-value). - We use this P-value to decide whether to reject
or not reject the null hypothesis. - But we need to remember that statistical
significance does not necessarily mean clinical
significance. - Confidence intervals should always be quoted with
a hypothesis test to give the magnitude and
precision of the difference.
44- You should now know about
- The process of setting and testing statistical
hypotheses - You should now be able to
- Explain
- Null hypothesis
- P-value
- Type I error
- Type II error
- Power
- Demonstrate awareness that the p-value does not
give the probability of the null hypothesis being
true - Demonstrate awareness that pgt0.05 does not mean
that we accept the null hypothesis - Distinguish between statistical significance
and clinical significance
45Next week..
- In the next Critical Numbers session we are going
to look at risk!
46One-sided vs two-sided significance testing
- Two-sided does not specify the
- direction of any effect
- There is a difference between treatment A and
treatment B - One-sided specifies the direction
- of the effect
- Treatment A is better than treatment B
47One-sided significance testing
- One-sided tests are rarely appropriate, even when
there is a strong prior belief as to the
direction of the effect, as by doing a one-sided
test you do not allow for the possibility of
finding an effect in the opposite direction to
the one you are testing - This is similar to history taking, when it is
important not to ask leading questions in case
you miss the correct diagnosis - The decision to do one-sided tests must be made
before the data are analysed it must not depend
on the outcome of the study - An example of when a one-sided test might be
appropriate is in clinical trials looking at
non-inferiority