Title: Statistical Tests
1Statistical Tests
2Methods of analysing a set of data
- Chapter 8. Hypothesis testing.
- Chapter 9. Comparing groups continuous data
- Chapter 10. Comparing groups categorical data
- Chapter 11. Relation between two continuous
variables
3Hypothesis Testing
- Most statistical testing in medicine is based on
hypothesis testing. - Most statistical analyses involve comparison,
most obviously between treatments or groups of
subjects. - The numerical value corresponding to the
comparison of interest is often called the
effect. - We can state a hypothesis called the null
hypothesis that the effect of interest is zero,
for example two treatments for headache are
equally effective, or that men and women on
average have equal serum cholesterol levels. - The statistical null hypothesis is often the
negation of the research hypothesis that
generated the data. - We also have an alternative hypothesis, which is
simply that the effect of interest is not zero.
4Interpretation of P values
- The P value is the probability of having observed
our data (or more extreme data) when the null
hypothesis is true. - When P is below the chosen cut-off point, say
0.05, the result is called statistically
significant. - For this reason hypothesis tests are often called
significance tests. - Contrast statistical with clinical significance.
- E.g. a small difference of 1 mmHg in both
diastolic and systolic blood pressure was found
(Gould et al., 1985) between the right and left
arms highly significant statistically, but of
no clinical significance. - Similarly it is not reasonable to take a
non-significant result as indicating no effect,
just because we cannot rule out the null
hypothesis.
5Type I and Type II errors
- Type I error we obtain a significant result, and
thus reject the null hypothesis, when the null
hypothesis is in fact true. A false positive
result. - Type II error we obtain a non-significant result
when the null hypothesis is not true. A false
negative finding.
6Standard error of a sample mean
- The mean of a sample is only an approximation of
the mean of the population as a whole. - The larger the sample, the better the
approximation. - The standard deviation of many sample means is s/
vn (standard deviation over the square root of
the number in the sample) - This is the standard error of the mean (SE).
7SE for serum albumin example
- Figure 4.5 showed that the distribution of the
observed serum albumin values in 216 patients
with PBC (primary biliary cirrhosis) was close to
a normal distribution. - The mean was 34.46 g/l and the standard deviation
was 5.84 g/l. - The SE of the sample mean serum albumin is thus
5.84 / v216 0.397 g/l. - The SE is a measure of the uncertainty of a
sample mean as an estimate of the population
mean.
8Standard error of the difference between two
sample means (1)
- The required standard error is obtained from a
more complicated formula than for the one sample
case, but involves only the variance (s1² and
s2²) and sample size (n1 and n2) for each group. - First we calculate the pooled variance, s²
9Standard error of the difference between two
sample means (2)
- If s is the pooled standard deviation (the square
root of the pooled variance), then
10Comment
- The approach given here for the calculation of
standard error is for small samples see 8.4.1
and 8.4.2 for the corresponding large sample
formulas.
11Confidence Intervals for the Mean
- The standard error may be used to calculate a
confidence level for the mean of a set of data. - For example, the average daily energy intake (kJ)
over 10 days for 11 healthy women was 6753.3,
with a standard deviation of 1142.1 - The standard error was 1142.1 / v11 344.4 kJ.
- For a 95 confidence interval for the mean daily
intake, we need the value of t corresponding to a
tail area of 0.05, with 11 1 10 degrees of
freedom. - From table B4 (see 0.05 column) the value of t we
need is 2.228. - The 95 confidence interval for the mean intake
is - mean ( t standard error)
- 6753.3 (2.228 x 344.4) 5986 to 7521 kJ.
12Matched-pairs t-test
- Paired data arises when the same individuals are
studied more than once, usually in different
circumstances. - Also when we have two groups of subjects who have
been individually matched, for example in a
matched pair case-control study. - In such cases we are interested in the
within-subject differences rather than the
between-subject differences which may obscure the
effect we are interested in.
13Mean daily dietary intake over 10 pre-menstrual
and 10 post-menstrual days (Manocha et al., 1886).
subject Pre-menstrual Post-menstrual
1 5260 3910
2 5470 4220
3 5640 3885
4 6180 5160
5 6390 5645
6 6515 4680
7 6805 5265
8 7515 5975
9 7515 6790
10 8230 6900
11 8770 7335
Mean 6753.6 5433.2
SD 1142.1 1216.8
14General and specific forms of the test statistic
15Calculation for dietary intake example.
1624-hour total energy expenditure (MJ/day) in
groups of lean and obese women (Prentice et al.,
1986)
- Lean (n 13) 6.13, 7.05, 7.48, 7.48, 7.53,
7.58, 7.90, 8.08, 8.09, 8.11, 8.40, 10.15, 10.88.
Mean 8.066, SD 1.238 - Obese (n 9) 8.79, 9.19, 9.21, 9.68, 9.69,
9.97, 11.51, 11.85, 12.70. Mean 10.298, SD
1.398.
17T-test for independent samples
18One Way ANOVA example
- An extension of the t-test for independent
samples for three or more independent groups of
observations. - 22 patients undergoing cardiac bypass surgery
were randomised to one of three ventilation
groups - Group I. Patients received a 50 nitrous oxide
and 50 oxygen mixture continuously for 24 hours - Group II patients received a 50 nitrous oxide
and 50 oxygen mixture only during the operation - Group III patients received no nitrous oxide but
received 35-0 oxygen for 24 hours. - The red cell folate levels were recorded for each
patient after 24 hours ventilation. - We wish to compare the three groups, and test the
null hypothesis that the three groups have the
same red cell folate levels.
19Red cell folate levels (µg/l) in three groups of
cardiac bypass patients given different levels of
nitrous oxide ventilation (Amess et al., 1978)
- Group I (n 8) 243, 251, 275, 291, 347, 354,
380, 392. Mean 316.6, SD 58.7 - Group II (n 9) 206, 210, 226, 249, 255, 273,
285, 295, 309. Mean 256.4, SD 37.1 - Group III (n 5) 241, 258, 270, 293, 328. Mean
278.0, SD 33.8.
20Red cell folate levels (µg/l) in three groups of
cardiac bypass patients given different levels of
nitrous oxide
21One way ANOVA calculation (1)
- Are the differences between the groups greater
than the differences within groups? - The total variability of the data set is measured
by the total sum of squares, which is based on
the sum of the squares of the difference of each
of the 22 observations from the overall mean.
This total is partitioned into - A) the within groups sum of squares (the sum of
squares of the difference between each
observation and the mean of its relevant group),
and - B) the between groups sum of squares, which is
based on the sum of squares of the difference
between the mean of each group and the overall
mean. - Each sum of squares is converted into an
estimated variance (known as a mean square) by
dividing by its degrees of freedom. - There are 3 1 2 df between groups, and (8 -
1) (9 1) (5 1) 19 df within groups.
22ANOVA table for nitrous oxide ventilation data
Source of variation Degrees of freedom Sums of squares Mean squares F P
Between groups 2 15515.9 7757.9 3.71 0.04
Within groups 19 39716.1 2090.3
Total 21 55232.0
23One Way ANOVA calculation (2)
- Under the null hypothesis that all the groups
have the same mean and variance, we expect the
between groups and within groups variance to be
the same, so we expect the ratio of the variances
to be 1. - In the example the two variances are 7757.9 and
2090.3, so their ratio is 3.71. - Comparing 3.71 with the F distribution with 2 and
19 degrees of freedom given in table T6, we find
p lt 0.05.
24One Way and Two Way ANOVA
- One way ANOVA corresponds to the t-test for
independent samples when more than two groups are
being compared. - Two way ANOVA corresponds to the t-test for
matched pairs when more than two groups are being
compared.
25Chapter 10. Comparing groups categorical data
- Categorical data are common in medical research,
arising when individuals are categorised into two
or more mutually exclusive groups. - The number falling into a particular group is
called the frequency. - When two or more groups are compared the data is
often shown in the form of a frequency table,
also called a contingency table. - A frequency table can be considered as a
cross-tabulation of two variables, either or both
of which can be ordinal.
26Comparison of number of hours swimming by
swimmers with or without erosion of dental
enamel observed frequencies O.
Amount of swimming per week Erosion of enamel (cases) No erosion of enamel (controls) Total
6 hours 32 118 150
lt 6 hours 17 127 144
Total 49 245 294
27Expected frequencies (E) row total column
total / grand_total
Amount of swimming per week Erosion of enamel (cases) No erosion of enamel (controls) Total
6 hours 32 (25.0) 118 (125.0) 150
lt 6 hours 17 (24.0) 127 (120.0) 144
Total 49 245 gt 294
28X² S (O E)²/E 4.802degrees of freedom
(rows 1) (columns 1)Using table B4, The t
distribution, p lt 0.05.
Amount of swimming per week Erosion of enamel (cases) No erosion of enamel (controls)
6 hours 32 (25.0) 1.960 118 (125.0) 0.392
lt 6 hours 17 (24.0) 2.042 127 (120.0) 0.408
29Relation between frequency of Caesarian section
and maternal shoe size
lt4 4 4½ 5 5½ 6 Total
Yes 5 7 6 7 8 10 43
No 17 28 36 41 46 140 308
Total 22 35 42 48 54 150 351
30Observed, Expected, contributions to X²
lt4 4 4½ 5 5½ 6
Yes 5 2.70 1.97 7 4.29 1.72 6 5.15 0.14 7 5.88 0.21 8 6.62 0.29 10 18.38 3.82
No 17 19.31 0.28 28 30.71 0.24 36 36.86 0.20 41 42.12 0.30 46 47.39 0.40 140 131.62 0.53
31Results
- X² 9.287 (add together the green numbers in
each cell) - Degrees of freedom 5
- P gt 0.1
- 80 of the cells should have expected values of
at least 5.
32Correlation and Regression
- Pearsons r
- Spearmans ? (rho)
33Reg Fasting blood glucose (mmol / l) and mean
circumferential shortening velocity of the left
ventricle (/sec) derived from echocardiography
in 24 type I diabetic patients.
Patient FBG Vcf Patient FBG Vcf
1 15.3 1.76 13 19.0 1.95
2 10.8 1.34 14 15.1 1.28
3 8.1 1.27 15 6.7 1.52
4 19.5 1.47 16 8.6 -
5 7.2 1.27 17 4.2 1.12
6 5.3 1.49 18 10.3 1.37
7 9.3 1.31 19 12.5 1.19
8 11.1 1.09 20 16.1 1.05
9 7.5 1.18 21 13.3 1.32
10 12.2 1.22 22 4.9 1.03
11 6.7 1.25 23 8.8 1.12
12 5.2 1.19 24 9.5 1.70
34(No Transcript)
35(No Transcript)
36Correlation coefficient r
37Estimating a and b
- Without going into mathematical detail, the
estimate of the gradient b is b Sxy / Sxx, - Where Sxx S(x mean x)²,
- Sxy S(x mean x)(y mean y)
- Sxx is related to variance, while Sxy is the
covariance between x and y. - An estimate of the intercept a is given by
- a mean y b(mean x)
38(No Transcript)
39Subject Age Rank Fat Rank
1 23 1.5 9.5 2
2 23 1.5 27.9 7
3 27 3.5 7.8 1
4 27 3.5 17.8 3
5 39 5 31.4 11
6 41 6 25.9 5
7 45 7 27.4 6
8 49 8 25.2 4
9 50 9 31.1 10
10 53 10.5 34.7 16
11 53 10.5 42.0 18
12 54 12 29.1 8
13 56 13 32.5 12
14 57 14 30.3 9
15 58 15.5 33.0 13
16 58 15.5 33.8 14
17 60 17 41.1 17
18 61 18 34.5 15
40(No Transcript)
41Formula for Spearmans ?
- Table B8 gives the relation between ? and p.