Comparison Among Groups more than 2 groups, 1 factor - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Comparison Among Groups more than 2 groups, 1 factor

Description:

Fecal coliforms, in organisms per 100ml, were measured in the Waterton River. ... Boxplots of Fecal Coliform data. Data from the Illinois River. 10 ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 31
Provided by: thomasm66
Category:

less

Transcript and Presenter's Notes

Title: Comparison Among Groups more than 2 groups, 1 factor


1
Comparison Among Groups(more than 2 groups, 1
factor)
2
Comparison Among k Groups (k gt2) With One Factor
  • The test to be considered determines if all k
    groups have the same central value (median or
    mean, depending on the test), or at least one of
    the groups differs from the others.

Parametric Approach
  • When data within each of the groups are normally
    distributed and possess identical variances -
    Analysis of Variance (ANOVA).
  • ANOVA tests whether each groups mean is
    identical.
  • If k 2, ANOVA t-test.

3
Nonparametric Approach
  • If the assumptions of normality and equal
    variance are not met - KRUSKAL-WALLIS test.
  • K-W test is much like the rank-sum test extended
    to more than two groups.
  • The K-W test compares the medians of groups
    differentiated by one explanatory variable or one
    factor e.g. months, seasons, locations.

4
Null Hypothesis Tests
  • All the one factor tests have as their null
    hypothesis that each groups median (or mean) is
    identical, with the alternative hypothesis that
    at least one is different. This is the same as
    saying all groups have the same distribution vs.
    at least one distribution differs.
  • However, when the null hypothesis is rejected,
    these tests do not tell which group or groups are
    different!
  • To tell which groups are different - Multiple
    comparison test.
  • Multiple comparison tests are performed only
    after the ANOVA or K-W null hypothesis has been
    rejected, for determining which groups differ
    from other.

5
Graphical Displays
  • As usual before any formal comparison is carried
    out, do side-by-side boxplots. These will
    indicated at a glance whether
  • 1. data in each group are normally distributed.
  • 2. variances are approximately equal.
  • 3. to use parametric or nonparametric tests.

The Kruskal-Wallis Test
  • Like other nonparametric tests, the K-W test may
    be computed by an exact method used for small
    sample sizes, by a large sample approximation
    (computer packages), or by ranking the data and
    performing a parametric test on the ranks.
  • Luckily, the exact method is rarely required.
    Large sample approximations give p-values very
    close to their exact values. Exact values are
    needed only when k3 with sample sizes of 5 or
    less per group, or k ? 4 of size 4 or less per
    group.

6
Large Sample Approximation for the K-W Test
  • Situation Several groups of data are to be
    compared, to determine if their medians are
    significantly different. For a total sample
    size of N, the overall average rank will equal
    (N1)/2. If the average rank within a group
    (average group rank) differs considerably from
    this overall average, not all groups can be
    considered similar.
  • Computation All N observations are jointly ranked
    from 1 to N, smallest to largest. These ranks
    Rij are then used for computation of the test
    statistic. Within each group, the average
    group rank Rj is computed

7
  • Tied data When observations are tied, assign the
    average of their ranks to each.
  • Test Statistic The average group rank Rj is
    compared to the overall average rank R
    (N1)/2, squaring and weighing by sample size,
    to form the test statistic K
  • Decision Rule To reject Ho all groups have
    identical distributions, vs.
  • H1 at least one distribution
    differs
  • Reject Ho if K ? x21-?,(k-1) the 1-? quantile
    of the chi- square distribution with (k-1)
    degrees of freedom, otherwise do not reject
    Ho.

8
Example
  • Fecal coliforms, in organisms per 100ml, were
    measured in the Waterton River. Do all four
    seasons exhibit similar values, or do one or more
    seasons differ?

Selected fecal coliform data (from Lin and Evans,
1980). counts in organisms per
100ml Summer Fall Winter
Spring 100 65 28 22 220 120 58 53
300 210 120 110 430 280
230 140 640 500 310 320 1600 1100
500 1300
PPCC 0.05 0.06 0.50 0.005
p-value
9
  • FECAL COLIFORM COUNTS (ORGANISMS/100ml)

2000 1500 1000 500 0



SUMMER FALL WINTER SPRING
Boxplots of Fecal Coliform data. Data from the
Illinois River
10
Answer
  • Should a parametric or nonparametric test be
    performed on these data? If even one of the four
    groups exhibits non-normality, the assumptions or
    parametric ANOVA are violated.
  • The consequences of this violation is an
    inability to detect differences which are truly
    present - lack of power.
  • Judging from the boxplots, a nonparametric test
    should be used on these data.
  • Computation of the K-W test is shown below, the
    computed K value (H in Minitab) is compared to
    the chi-square distribution.

11
Selected fecal coliform data (from Lin and Evans,
1980). counts in organisms per
100ml Summer Fall Winter
Spring 6 5 2 1 12 8.5 4 3 15 11 8.5
7 18 14 13 10 21 19.5 16 17 24 22
19.5 23 16 13.3 10.5 10.2
Ranks Rij
Rij
R 12.5
K 2.69 ?20.95,(3)7.815 p0.44 so, do
not reject equality of distributions.
12
The Rank Transform Approximation to the K-W Test
  • Computed by performing a one-factor ANOVA on the
    ranks Rij. This approximation compares the mean
    rank within each group to the overall mean rank,
    using an F-distribution for the approximation of
    the distribution of K.
  • The F and chi-square approximations will result
    in very similar p-values.
  • The rank transform method should properly be
    called an ANOVA on the ranks.

13
  • This approach becomes useful when we want to
    perform multiple comparison tests using Tukeys
    method which is a parametric approach. The
    reason for this is that there is no good
    nonparametric multiple comparison tests available
    at present.
  • Using the previous data, the p-value 0.47 when
    the rank transform approximation is used. This
    p-value is essentially identical to that for the
    large sample approximation.
  • The details of the computations will follow after
    ANOVA.

14
ANOVA (One Factor)
  • Parametric equivalent to the K-W test. It
    compares the mean values of each group with the
    overall mean for the entire data set.
  • If the group means are dissimilar, some of them
    will differ from the overall mean. See following
    figure.
  • If the group means are dissimilar, they will also
    be similar to the overall mean.
  • Why should a test of differences between means be
    named an analysis of variance?

15
  • In order to determine if the differences between
    group means (the signal) can be seen above the
    variation with groups (the noise), the total
    noise in the data as measured by the total sum of
    squares is split into two parts.
  • Total sum of squares treatment of sum of
    squares error of sum of squares
  • (overall variation) (group means - overall
    mean) (variation within groups)

16
Computation
  • If the total sum of the square is divided by N-1,
    where N is the total number of observations, it
    equals the variance of the yijs. Thus ANOVA
    partitions the variance of the data into two
    parts, one measuring the signal and the other the
    noise. These are then compared to determine if
    the means are significantly different.

One Factor Analysis of Variance
Situation Several groups of data are to be
compared, to determine if their means are
significantly different. Each group is assumed
to have a normal distribution around its mean.
All groups have the same variance.
17
  • Computation The treatment mean square and error
    mean square are computed as their sum of square
    is divided by their degrees of freedom (df).
    When the treatment mean square is larger than
    the error mean square as measured by an F-test,
    the group means are significantly different.
  • where k-1 treatment degrees of
    freedom
  • where N-k error degrees of freedom
  • Tied data No alterations necessary.

18
  • Test Statistic The test statistic F
  • F MST/MSE
  • Decision Rule To reject Ho the mean of every
    group is identical, vs.
  • H1 at least one mean
    differs.
  • Reject Ho if F ? F1-?, k-1, N-k the 1-?
    quantile of an F distribution with k-1 and N-k
    degrees of freedom otherwise, do not reject
    Ho.

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
  • The computations and results of an ANOVA are
    usually organized into an ANOVA. For a one-way
    ANOVA, the table looks like
  • Source df SS MS F p-value
  • Treatment k-1 SST MST MST/MSE p
  • (between group)
  • Error N-k SSE MSE
  • (within-group)
  • Total N-1 Total SS
  • where
  • and

23
  • Using the data of the example, the ANOVA table is
    given below
  • Source df SS MS F p-value
  • Season 3 361367 120466 0.67 0.58
  • Error 20 3593088 179654
  • Total 12 3954485
  • In general, large F value gtgt 1 small
    p-values reject Ho.
  • We should not be using ANOVA here because of
    non-normality.

24
Assumptions of the ANOVA
  • Since the t-test is a special case of the ANOVA,
    all the assumptions pertaining to the t-test
    apply to the ANOVA.
  • 1. All samples are random samples from their
    respective populations.
  • 2. All samples are independent of one another.
  • 3. Departures from group mean are normally
    distributed for all groups.
  • 4. All groups have equal variance.
  • Violation of either the normality or constant
    variance assumption results in a loss of ability
    to see differences between means (a loss of
    power).

25
  • The ANOVA suffers from the same five problems as
    did the t-test
  • 1. lack of power when applied to non-normal data
  • 2. dependence on an additive model
  • 3. lack of applicability for censored data
  • 4. assumption that the mean is a good measure of
    central tendency for skewed data
  • 5. difficulty in assessing whether the normality
    and equality of variance assumptions are valid
    for small sample sizes.

26
Multiple Comparison Tests (MCTs)
  • MCTs compare all possible pairs of treatment
    group means or medians, and are performed only
    after the null hypothesis of all medians or
    means identical has been rejected.
  • Many MCTs are available in the literature e.g.
    Sheffe, Bonferroni, Fisher, Tukey, Duncans
    multiple range test, Regwq, Regwf, etc.
  • Of all the methods available, Tukeys method is
    the most generally applicable and powerful MCT
    for a variety of situations. (This test is
    available on Minitab).

27
  • Tukeys method is a parametric procedure. For
    non-normal data, use the ANOVA on the rank
    transform, then use Tukeys method to test for
    differences in the means of the ranks.
  • For more details see
  • Steel and Torrie (1980) Principle and Procedures
    of Statistics - A Biometrical Approach, McGraw
    Hill.
  • We only need to learn how to interpret the
    results given by Minitab.
  • Use family error rate of 0.05
  • If the interval does not include zero, then
    difference is statistically significant.

28
Presentation of Multiple Comparison Tests
  • The results are often presented in one of the two
    following formats
  • 1. Letters
  • y1 gt y2 gt y3 gt y4
  • y1 y2 y3 y4
  • A AB BC C

29
  • Treatment group means are ordered, and those
    having the same letter underneath them are not
    significantly different. The convenience of this
    presentation format is that letters can easily be
    positioned within side-by-side boxplots.

A
MCT results Boxes with same letter are not
significantly different
AB
BC
C
Boxplots with letters showing the result of a MCT.
30
  • 2. Lines
  • y1 y2 y3 y4
  • In this presentation format, group means
    connected by a single unbroken line are not
    significantly different. This format is suited
    for inclusion in a table listing group means or
    medians.
Write a Comment
User Comments (0)
About PowerShow.com