Title: Statistical Tests
1Statistical Tests
2Statistical Inference
- Estimation
- Quantify the uncertainty in the estimation
- Hypothesis Tests
3Estimation
- Before a statistical tests can be performed,
identify particular quantities of interest - Example mean blood glucose levels between
individuals with and without diabetes
Quantity estimated Mean blood glucose
4Confidence Intervals
- Not sufficient to just provide an estimated
quantity, need to quantify the extent of
uncertainty involved in the estimation. - Assumes data has a bell-shaped / symmetric
distribution, confidence intervals calculated
about the mean.
5Remarks on Confidence Intervals
- Interval is random, parameter to be estimated is
not. - Width of interval is a measure of precision.
Confidence level as a measure of accuracy. - Width of CI depends on the magnitude of the
uncertainty (standard error), and level of
confidence required. - Assumptions must be satisfied before
constructing CIs.
6Interpreting Confidence Intervals
- If we were to
- repeat the experiment 100 times
- construct 95 CI for each time
- Then we would expect 95 of the CIs to cover or
include the true population value.
7Hypothesis Testing
- Null hypothesis A statement of status quo, or
of no changes - Alternative hypothesis Hypothesis which the
researcher wishes to investigate - Commonly, the alternative hypothesis is first
formulated, and the null hypothesis is the
negation of the alternative hypothesis.
8Pregnancy Test Kit
A woman buys a pregnancy test kit, and is
interested to find out whether she is pregnant.
The null hypothesis in this case (status quo),
is that she is not pregnant. The alternative
hypothesis (hypothesis of interest), is that she
is pregnant. Test kit may show ve indicating
there is evidence to suggest pregnancy ve
indicating lack of evidence to suggest pregnancy
9Pregnancy Test Kit
The test kit may either be accurate, or
inaccurate.
Actually pregnant
Actually not pregnant
Correct ve diagnosis
Incorrect ve diagnosis
Test kit shows ve
Incorrect ve diagnosis
Correct ve diagnosis
Test kit shows ve
10Pregnancy Test Kit
The test kit may either be accurate, or
inaccurate.
11Types of Errors
Type 1 Error(p-value)
False ve conclusion(ve when woman is in fact
not pregnant)
False ve conclusion(ve when woman is in fact
pregnant)
Type 2 Error
Power(Sensitivity)
True ve conclusion(ve when woman is in fact
pregnant)
True ve conclusion(ve when woman is in fact
not pregnant)
Specificity
12P-values
- Probability of observing a false positive
result, also known as the significance of the
test. - If the p-value is small, we are more confident
that the null hypothesis can be rejected. - On average, expect 1 false positive result out
of 20 positive results obtained. - So if we perform a large study with 1 million
variables, and we obtain 1,000 variables with
p-values lt 0.05, we expect 50 of them to be
false!
13Hypothesis Tests
- 1 sample compare mean against hypothesized
value(example we believe the mean weight for
all the male students in Oxford is 75kg)
1 Sample t-test
- 2 sample compare means between two
groups(example we want to compare the mean
weight for male students and female students in
Oxford)
2 Independent Sample t-test
- Assumptions
- Data has a symmetric distribution within each
group. - Independence between different individuals.
14Hypothesis Tests
- Paired data compare the difference within each
pair(example want to find the effects of a diet
treatment, thus comparing the weight before and
after the treatment.)
Paired Sample t-test
- Assumptions
- Difference between the pairing has a symmetric
distribution. - Independence between pairs of observations.
15Hypothesis Tests
- ? 2 samples compare the means across the
groups (example want to find the difference in
height between Africans, Asians and Europeans.)
Analysis of Variance(ANOVA)
- Assumptions
- Data within each population has a symmetric
distribution. - Amount of spread in the data is identical across
the different populations (same variance). - Independent observations between and within each
population.
16Practical Example
- Previous studies suggest restriction caloric
intake can increase life expectancy. - Perform an experiment with mice, each randomly
assigned to one of six diet treatment. - Measure the time of death for each mouse (in
months). - Experimental Design
17Visualising the Data
18Practical Example
- Research Questions
- Is there any difference in life expectancy
across the different diet treatments? - If there is, which diet treatment contribute to
this difference? - Which diet treatment significantly increases
life expectancy?
19Practical Example
20Multiple Comparisons
- Can compare every possible pair of treatments.
DANGER!
- More number of tests ? more chances of making a
false judgement. - Remember p-value threshold of 0.05 ? 1 out of 20
judgement may be false. - There are 15 possible pairings for the 6
treatment groups ? very likely to make a false
judgement!
21Bonferroni Correction
- Make it harder to define a result as
significant. - By lowering the p-value threshold. But to lower
by how much?
22Post-Hoc Analysis