Title: Comparing Two Samples: Part I
1Comparing Two Samples Part I
2Measurements (data)
Descriptive statistics
Data transformation
Normality Check Frequency histogram (Skewness
Kurtosis) Probability plot, K-S test
YES
NO
Mean, SD, SEM, 95 confidence interval
Median, range, Q1 and Q3
Data transformation
Check the Homogeneity of Variance
Non-Parametric Test(s) For 2 samples
Mann-Whitney For 2-paired samples Wilcoxon For
gt2 samples Kruskal-Wallis Sheirer-Ray-Hare
NO
YES
Parametric Tests Students t tests for 2 samples
ANOVA for ? 2 samples post hoc tests for
multiple comparison of means
3Procedures for comparing two samples
- Test normality of the data if passed, then
- Compare the measurements both of location
(central value) and dispersion (spread or range)
of the thickness indices between the two sites - If there is a difference only in the measure of
location, a parametric statistical test based
upon the difference between the two sample means
4One sample t test t (X - ?)/SX
- The 2-tailed t test for significant difference
between a mean longevity of horses and a
hypothesized population mean of ? 22 yr - Ho ? 22 year
- Age at death (in year) of 25 horses
t (24.23 - 22)/ (4.25/?25) 2.624 ? n - 1
25 - 1 24 t 0.05 (2), 24 2.064 Thus
reject Ho 0.01ltplt0.02
5Confidence interval for mean
- A researcher often needs to know how close a
sample mean (x) is to the population mean (?). - For example, the amount of dissolved/dispersed
petroleum hydrocarbons (DDPH) in the upper ocean. - It is impossible to sample the entire ocean so
the population mean at any time is unknown. If
data of the population follow the normality, the
sample mean will be normally distributed. - The standard deviation (or standard error ) of
the mean can be obtained by - SX S.E.M. S/?n
- In general, the population mean lies within one
standard deviation (S) of the sample mean with
about 68 confidence. It is usual to quote 95
confidence intervals by multiplying S.E.M. with
the (1) appropriate value of z, namely 1.96 (for
n gt30) or (2) appropriate value of t, df n -1
(for n lt 30).
6Confidence interval for mean
- e.g. 50 measurements of DDPH in the upper ocean
were made. The sample mean 4.75 ppb and S
3.99 ppb - Then SX 3.99/ ?50 0.5643
- The 95 confidence intervals are obtained as
- X ? z 0.05 SX 4.75 ? (1.96)(0.5643) 4.75 ?
1.105 ppb - i.e. 5.855 and 3.633 ppb
- e.g. Only 10 measurements of DDPH in the upper
ocean were made. The sample mean 78.2 ppb and S
38.6 ppb - Then SX 38.6/ ?10 12.206
- The 95 confidence intervals are obtained as
- X ? t 0.05, n-1 SX 78.2 ? (2.262)(12.206)
78.2 ? 27.61 ppb -
7Two-tailed and one-tailed tests
- If we do not know which of the means is greater
than the other, we reject extreme values in
either direction (i.e. HA ?a ? ?b) this
procedure is known as a two-tailed test. - However, if sufficient information is available
for us to specify the direction of the population
means, we could frame the alternative hypothesis
as HA ?a gt ?b or ?a lt ?b - For example, comparison of total organic carbon
(TOC) concentrations in the sewage effluent
before and after being upgraded to an advance
treatment. In this case, a reduced in TOC would
be expected after improvement of the sewage
treatment, and therefore, HA ?before gt ?after - The corresponding critical value of z for
1-tailed tests is slightly different from those
for 2-tailed tests (see Table B3).
8The difference between two sample means with
limited data
- If nlt30, the above method gives an unreliable
estimate of z - This problem was solved by Student who
introduced the t-test early in the 20th century - Similar to z-test, but instead of referring to z,
a value of t is required (Table B3) - df 2n - 2 for n1 n2
- For all degrees of freedom below infinity, the
curve appears leptokurtic compared with the
normal distribution, and this property becomes
extreme at small degrees of freedom.
9Comparison of two samples
Mean measured unit
A
B
10Two-sample t test
Measured unit
A
B
Error bar ?2 SD
11Two-sample t test
Measured unit
A
B
Error bar ?2 SD
12Measured unit
A
B
Error bar ?2 SD
13Importance of Equal Variance
- If the measure of dispersion (i.e. homogeneity of
variance) also differs significantly between the
samples - data transformation procedure if there was no
significant difference between the variances of
transformed data, then a parametric test could be
applied otherwise you should consider - non-parametric test or Welchs approximate t
(Zars p. 128-129)
14Assumptions for z and t tests
- For the t and z tests to be valid
- the measurements should be at least approximately
normally distributed and - with similar sample variances (i.e.,
homoscedastic or homogeneity of variance) - Need to check Normality of both datasets
- Homoscedasticity can be checked by a F-test
- The largest sample variance
- F ratio
- The smallest variance
15Check for homogeneity of variance
- Ho equal variance between the two samples
- HA unequal variances
- Given that Sa2 0.2898 (n 15) and Sb2 0.1701
(n 12), - Then, F ratio 0.28929/0.1701 1.703
- Use the F table (Table B4, Zar 99, App.34),
- Critical F 0.05(2), 14, 11 3.36 gt 1.703
- Accept Ho
- If Ho rejected, then transform data and redo F
test - If still failed, non-parametric or Welchs
approximate t
16An example
- Birds eggshells are thought to be influenced by
acid rain which reduces the egg thickness index
(egg shell mass/ surface area mg cm-2) - We investigate gulls eggs at two nesting sites
a control site and a site affected by acid rain - Taking a single egg at random from a number of
nests at each site determining the egg thickness
index
17Two sample Z test
- Ho no significant difference in the the
thickness indices between the two sites - i.e. the thickness indices belong to the same
population (or probability density function).
Then the population means for the two sites will
be the same and their difference will be zero. - Based on the central limit theorem, a collection
of sample means will follow a normal
distribution. It can also be shown that the
distribution of the difference of two means (Xa -
Xb) will also be normally distributed. - Recall the distribution of z if (Xa - Xb) can
be divided by an appropriate standard deviation,
a value of z can be calculated
18Two sample Z test
- The standard deviation of (Xa - Xb)
- ?(1/n)(Sa2 Sb2)
- Then, Z (Xa - Xb)/?(1/n)(Sa2 Sb2)
- This equation can be used provided the number of
measurements in each sample is reasonably large
(n gt 30), the measurements are approximately
normally distributed and n in each sample is the
same. - Ho ?a ?b HA ?a ? ?b
- If the sample means are similar, the z value will
be small and Ho will be accepted. The critical
values of z can be obtained from the t table,
Table B3, (df ?).
19Two sample Z test
- The following data were obtained at two gull
nesting sites. 50 eggs were taken at random from
each site, with 1 egg taken randomly from each
nest. - Egg thickness index results (mg cm-2)
- Site a (control) b
- n 50 50
- mean 33.2 31.89
- S2 16.41 17.39
- Ho ?a ?b HA ?a ? ?b
- z (Xa - Xb)/?(1/n)(Sa2 Sb2)
(33.2-31.89)/?(1/50)(16.41 17.39) - z 1.593
- As critical z?0.05, df ? 1.96, accept Ho.
Remember to always check the homogeneity of
variance before running the t test.
20Exercise
- The following data were obtained at two gull
nesting sites. 50 eggs were taken at random from
each site, with 1 egg taken randomly from each
nest. - Egg thickness index results (mg cm-2)
- Site a (control) b
- n 50 50
- mean 33.2 27.8
- S2 16.41 15.21
- Test the Ho ?a ?b (HA ?a ? ?b) using the two
sample z test - z (Xa - Xb)/?(1/n)(Sa2 Sb2)
Please do it later !
Remember to always check the homogeneity of
variance before running the t test.
21A t test with equal measurements in each sample
(na nb)
Example
- e.g. The chemical oxygen demand (COD) is measured
at two industrial effluent outfalls, a and b, as
part of consent procedure. Test the null
hypothesis Ho ?a ?b while HA ?a ? ?b (Given
that the data are normally distributed)
SS sum of square S2 ?
sp2 (SS1 SS2) / (?1 ?2) (0.257 11)
(0.366 11)/(1111) 0.312 sX1 X2 v(sp2/n1
sp2/n2) v(0.312/12) 2 0.228 t (X1 X2)
/ sX1 X2 (3.701 3.406) / 0.228 1.294 df
2n - 2 22 t? 0.05, df 22, 2-tailed 2.074
gt ?t observed? 1.294, p gt 0.05 The calculated
t-value is less than the critical t value. Thus,
accept Ho.
n 12 12 mean 3.701 3.406 S2 0.257 0.366
Remember to always check the homogeneity of
variance before running the t test.
22Example
- Growth of 30-weeks old non-transgenic and
transgenic tilapia was determined by measuring
the body mass (wet weight). Since transgenic fish
cloned with growth hormone (GH) related gene
OPAFPcsGH are known to grow faster in other fish
species (Rahman et al. 2001), it is hypothesized
that HA ?transgenic gt ?non-transgenic while the
null hypothesis is given as Ho ?transgenic ?
?non-transgenic
23Example
- Ho ?transgenic ? ?non-transgenic
- HA ?transgenic gt ?non-transgenic
- Given that mass (g) of tilapia are normally
distributed.
sp2 (SS1 SS2) / (?1 ?2) 5913.4 sX1 X2
v(sp2/n1 sp2/n2) 38.45 t (X1 X2) / sX1
X2 8.29 df 2n - 2 14 t? 0.05, df 14,
1-tailed 1.761 ltlt 8.29 p lt 0.001 The t-value
is greater than the critical t value. Thus,
reject Ho.
n 8 8 mean 625.0 306.25 S2
6028.6 5798.2
Remember to always check the homogeneity of
variance before running the t test.
24Example
- e.g. The data are human blood-clotting time (in
minutes) of individuals given one of two
different drugs. It is hypothesized that Ho ?a
?b while HA ?a ? ?b (Given that the data are
normally distributed)
sp2 (SS1 SS2) / (?1 ?2) 0.519 sX1 X2
v(sp2/n1 sp2/n2) 0.401 t (X1 X2) / sX1
X2 -2.470 t? 0.05, df 6 7 -2 11,
2-tailed 2.201 lt 2.470 0.02ltplt0.05 Thus,
reject Ho but accept HA.
n 6 7 mean 8.75 9.74 S2 0.339 0.669
25Power of the two-sample t test
- Power of two-sample t test is greatest when the
number of measurements in each sample is the same
(n1 n2) - Power of two-sample t test for different numbers
of measurements in each sample (n1 ? n2) is
smaller - When n1 ? n2, effective n 2n1n2 /(n1n2)
- e.g. n16, n27,
- effective n 2(6 7)/ (67) 6.46,
- which is smaller than the average of 6 and 7
(6.5) - Therefore, we should always use balanced design
where possible.
26Measurements (data)
For comparison of two independent samples
Descriptive statistics
Data transformation
Normality Check Frequency histogram (Skewness
Kurtosis) Probability plot, K-S test
NO
YES
Mann-Whitney test
Mean, SD, SEM, 95 confidence interval
Median, range, Q1 and Q3
Data transformation
F-test
Check the Homogeneity of Variance
Non-Parametric Test(s) For 2 samples
Mann-Whitney For 2-paired samples Wilcoxon For
gt2 samples Kruskal-Wallis Sheirer-Ray-Hare
NO
YES
z test t tests
Parametric Tests Students t tests for 2 samples
ANOVA for ? 2 samples post hoc tests for
multiple comparison of means
27Non-parametric methods for comparison of two
independent samples
- Why do we need to transform the data into normal
distribution for parametric tests when we can
apply non-parametric tests? - Parametric tests are more powerful and more
sensitive than non-parametric ones.
28Non-parametric tests for two samples - the
Mann-Whitney test
- Non-parametric tests are also called
distribution-free test because they are
independent of the underlying population
distribution, the only assumption being
independence of observations and continuity of
the variable which is being measured. Most of
these tests involve a ranking procedure - Mann-Whitney test is useful for two independent
samples and can be used as an alternative to the
t-test, particularly where the assumptions for
t-test cannot be demonstrated - It can also be applied to ordinal data
- It is usually assumed that the distribution of
the measurements in the two samples are of the
same general form (shown by frequency histogram
or stem-and-leaf plot) - This test can be undertaken with any sample size.
But the method of calculating the test statistic
(U) depends upon the size of n.
29Mann-Whitney test - basic principle
35
24
1
23
21
19
17
15
9
4
3
Case 1
35
24
1
23
21
19
17
15
9
4
3
Case 2
35
24
1
23
21
19
17
15
9
4
3
Case 3
35
24
1
23
21
19
17
15
9
4
3
Case 4
Unequal medians
Group A
Group B
30Non-parametric tests for two samples - the
Mann-Whitney test
Example
- In Mann-Whitney test, the null hypothesis cannot
be stated in terms of population parameters, but
is defined as the equality of the medians of the
populations from which the two samples are drawn. - Example Mann-Whiney test where n is small
- Sample A Sample B
- 9 10
- 12 15
- 6 11
- 7 13
- 18
- These measurements are combined and ranked in
descending order - B B B A B B A A A
- 18 15 13 12 11 10 9 7 6
31Example
- Example I Mann-Whiney test (2-tailed test)
- These measurements are combined and ranked in
descending order - B B B A B B A A A
- 18 15 13 12 11 10 9 7 6
- Ho equal medians
- Summation of ranks for A score
- ? R2 4 7 8 9 28
- Summation of ranks for B score
- ? R1 1 2 3 5 6 17
- U n1n2 n1(n1 1)/2 - ?R1
- where the greatest measurement (smaller sum of
ranks) in either of the two groups is given as
rank 1 (R1) Thus group B is R1 in this case. - U (5)(4) (5)(5 1)/2 - 17 18
- U0.05(2), 5, 4 U0.05(2), 4, 5 19 gt 18 thus
accept Ho i.e. equal medians
32Example
Sample A Sample B 9 10 12 15 6 11
7 13 18
Sample A Sample B 12 18 9 15 7 13
6 11 10
Sample A Rank Sample B Rank 12 4 18 1
9 7 15 2 7 8 13 3 6 9 11 5 10 6
18 15 13 12 11 10 9 7 6
Ho equal medians U n1n2 n1(n1 1)/2 - ?
R1 where the greatest measurement in either of
the two groups is given as rank 1 (R1) Thus
group B is R1 in this case. U (5)(4) (5)(5
1)/2 - 17 18 U0.05(2), 5, 4 U0.05(2), 4, 5
19 gt 18 thus accept Ho i.e. equal medians
R1 17
R2 28
33Example
Sample A Sample B 9 10 11 15 6 12
7 13 18
Sample A Sample B 11 18 9 15 7 13
6 12 10
Sample A Rank Sample B Rank 11 5 18 1
9 7 15 2 7 8 13 3 6 9 12 4 10 6
18 15 13 12 11 10 9 7 6
R1 16
R2 29
U n1n2 n1(n1 1)/2 - ? R1 U (5)(4)
(5)(5 1)/2 - 16 19 U0.05(2), 5, 4
U0.05(2), 4, 5 19 thus reject Ho at p 0.05
i.e. unequal medians
34Example
Sample A Sample B 9 11 10 15 6 12
7 13 18
Sample A Sample B 10 18 9 15 7 13
6 12 11
Sample A Rank Sample B Rank 11 6 18 1
9 7 15 2 7 8 13 3 6 9 12 4 11 5
18 15 13 12 11 10 9 7 6
R1 15
R2 32
U n1n2 n1(n1 1)/2 - ? R1 U (5)(4)
(5)(5 1)/2 - 15 20 U0.05(2), 5, 4
U0.05(2), 4, 5 19 lt 20 thus reject Ho at p
0.02 i.e. unequal medians
35Example
- Example II Mann-Whiney test (2-tailed test)
- The following data provide representative soil
moisture contents on south and north-facing
slopes under grassland in June. The Mann-Whitney
test is used to test the null hypothesis that the
population medians of the two samples are the
same.
- A variance ratio test will show that the
measurements are heteroscedastic. - Also, the measurements are percentages, which
would not be expected to be normally distributed.
36Example
- A variance ratio test will show that the
measurements are heteroscedastic. - Also, the measurements are percentages, which
would not be expected to be normally distributed. - U n1n2 n1(n1 1)/2 - ?R1
- U (14)(17) (14)(14 1)/2 - 116 227
- U0.05(2), 14, 17 169 lt 227
- p lt 0.001, thus reject Ho
- The medians of the two samples are significantly
different
n 14 ?R1 116 n 17 ?R2
380
37Non-parametric tests for two samples - the
Mann-Whitney test for n gt 20
- Critical tables for U become unwieldy as n
increases above 20 and Mann Whitney obtained z
as a function of U and n as shown below - z (U - n1n2/2)/?n1n2(n1 n2 1)/12
- Suppose that the test value of U was found to be
148 with n1 16 and n2 29. Then - z (148 - (16 ? 29/ 2)/?(16)(29)(16 29
1)/12 -84/42.174 -1.99 - For a 2-tailed test, the modulus (1.99) is taken.
- Referring Table B3, df infinity, at p 0.05, z
1.96 lt 1.99 - Thus, reject Ho
38Important Notes
- Comparisons between two samples can be made with
reference to the difference between the sample
means or medians - Comparison between sample means are made by
calculating their difference and dividing by an
appropriate sample standard deviation - Where the sample size is large (e.g. n gt 50) a
two-sample z test can be used. For small samples,
a Students t-test can be used - The parametric t and z tests should be applied to
independent interval/ ratio measurements which
are at least approximately normally distributed - A simple test of homoscedasticity (F ratio test)
should be applied prior to application of t and z
tests - For non-normally distributed or heteroscedastic
data, the non-parametric Mann-Whitney test can be
utilized