COMPARISON OF MEANS - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

COMPARISON OF MEANS

Description:

A random sample of kittens is fed a vitamin supplement from birth to see if the ... The kittens that took the supplement score 4 points higher on the average than ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 20

Provided by: eafr

Category:

more less

Transcript and Presenter's Notes

Title: COMPARISON OF MEANS

1
COMPARISON OF MEANS

AED 616
SPRING 2007

2
Introduction

We will consider a frequently encountered
problem how to compare the means of two samples
for statistical significance.
Lets consider two examples.

3
Example 1

A researcher wants to determine whether there are
differences between men and women voters in their
attitudes toward welfare. Samples of men and
women are drawn at random and administered an
attitude scale to obtain a score for each
subject. Means for the two samples were computed.
Women had a mean of 40.00 (on a scale of 0 to
50, where 50 is the most favorable). Men had mean
of 35.00 The researcher wants to determine
whether there is a significant difference between
men and women. What accounts for the 5-point
difference? One possible explanation is the null
hypothesis, which states that there is no true
differences between men and women that the
observed difference is due to sampling errors
created by random sampling.

Example 1 illustrates that two means may be
obtained from a survey.

5
Example 2

A random sample of kittens is fed a vitamin
supplement from birth to see if the supplement
increases their visual acuity. Another random
sample is fed a placebo that looks like the
supplement but contains no vitamins. A the end of
the study, both samples are tested for visual
acuity and an average acuity score is calculated
for each sample. The kittens that took the
supplement score 4 points higher on the average
than did the control group. What accounts for the
4-point difference? One possible explanation is
the null hypothesis, which states that there is
no true difference between the two samples of
kittens tht the observed difference is due to
sampling errors created by random sampling.

Example 2 illustrates that two means may be
obtained from an experiment a study in which
treatments are given in order to observe for
their effects.
Surveys and experiments are very frequently
conducted, and they often yield two means each,
so you can see how important it is to be able to
test the null hypothesis for the difference
between the two sample means.
A statistician named William Gosset developed the
t-test for exactly the situations we are
considering.
As a test of the null hypothesis, it yields a
probability that a given null hypothesis is
correct.
When the probability that it is correct is low
say .05 or 5 or less we usually reject the
null hypothesis.
What leads the t test to give us a low
probability that the null hypothesis is correct?
Here are three basic factors

7
Factor 1

The larger the samples, the less likely the
difference between two means was created by
sampling errors. Larger samples have less
sampling error than smaller ones. Thus, when
large samples are used, the t test is more likely
to yield a probability low enough to allow us to
reject the null hypothesis than when small
samples are used.

8
Factor 2

The larger the differences between the two means,
the less likely the difference was created by
sampling errors. Random sampling tends to create
many small differences and few large ones. Thus,
when large differences between means are
obtained, the t test is more likely to yield a
probability low enough to allow us to reject the
null hypothesis than when small differences are
obtained.

9
Factor 3

The smaller the variance among the subjects, the
less likely that the difference between the two
means was created by sampling errors. To
understand this, consider a population in which
everyone is identical they al look alike, think
alike, and speak and act in unison. How many do
you have to sample to get a good sample? Only
one, because the are all the same. Thus, when
there is no variation among subjects, it is not
possible to have sampling errors. As the
variation increases, sampling errors are more and
more likely to occur.

10
Types of t tests

There are two types of t tests
One is for Independent data (sometimes called
uncorrelated data)
Other is for dependent data (sometimes correlated
data).
Examples 1 and 2 have independent data.
Example 3 (to follow) describes a study with
dependent data.

11
Example 3

In a study of visual acuity, same-sex siblings
(two brothers or two sisters) were identified for
a study. For each pair of siblings, a coin was
tossed to determine which one received a vitamin
supplement and which one received a placebo.
Thus, in the control group, there is a subject
who is a same-sex sibling of each subject in the
experimental group.

The means that results from the study in Example
3 are subject to less error than the means from
Example 2.
Remember that in Example 2, there was no matching
or pairing of subjects before assignment to
conditions.
In Example 3, the matching of subjects assure us
that the two groups are more similar than if just
tow independent samples were used.
To the extent that genetics and gender are
associated with visual acuity, the two groups in
Example 3 will be more similar at the onset of
the experiment than the two groups in Example 2.
The t test for dependent data takes the possible
reduction for error into account. Thus, it is
important to select the right t test.

Independent data are obtained when there is no
matching or pairing of subjects across groups.
Here is how to compute t for independent data and
how to interpret it using the t table.
Formula for t is simple
m1 m2
t --------------
SDm
Where
m1 is the mean of the group with the
higher mean
m2 is the mean of the group with the
lower mean
SDm is the standard error of the
difference between means

Numerator is the difference between the two
means. Larger the difference, the larger the
value of t.
Denominator starts with the symbol S (for
standard deviation). The subscripts (D for
difference and m for means) tell us that it is
the standard deviation of the difference between
means this standard deviation is called the
standard error of the difference between means.
We want to know whether the difference between
two means is an unlikely event. If it is unlikely
(for example, likely to occur less than 5 times
in 100 due to chance alone), then we will declare
the difference to be statistically significant
that is, unlikely to be the result of random
errors.

It is impractical to directly obtain the SDm for
a given t test. What we do is estimate it given
what we know about the sample size and the
variance of the samples (the variance is simply
the square of the standard deviation, whose
symbol is s2)using the following formula (check
out the board)

We call the result the observed value of t.
In this example, we observed a value of 3.300. To
evaluate its meaning, we need to take account of
the number of cases that underlie it, using this
formula for the degrees of freedom (df)3
df n1 n2 2
Where
n1 is the number of cases in Group1
n2 is the number of cases in Group 2
2 is a constant for this type of problem

For our example
df 12 11 2 23 2 21
To evaluate, we use the appropriate critical
value of t found in the t table in backs of stats
books.
Examining Table 4, we look up the degrees of
freedom (21 for our example).
Look up 21 in the first column, then look to
the right of the .05 column. There you find the
critical value of 2.080.
We have found that for 21 degrees of freedom,
only values as extreme as 2.080 are unlikely
events at the .05 level.
Our observed value is 3.300. Is this an unlikely
event?
Yes. Because the observed value of 3.300 exceeds
the critical value for the .01 level for 21
degrees of freedom, which is 2.831.
Thus we can reject the null hypothesis and
declare the result to be statistically
significant at the .01 level.

18
Reporting the Results of t Tests

We are considering the use of the t test to
measure the difference between two sample means
for significance.
Obviously, you should report the values of the
means before reporting the results of the test on
them.
In addition, you should report the values of the
standard deviations and number of cases in each
group.
This may be done within the context of a sentence
or in a table.