Hypothesis Testing - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Hypothesis Testing

Description:

Type II error - analogous to letting a guilty offender go free. ... Example: Birth weights ... The mean birth weight of this sample is 115 oz, with a sample ... – PowerPoint PPT presentation

Number of Views:601
Avg rating:3.0/5.0
Slides: 34
Provided by: edward81
Category:

less

Transcript and Presenter's Notes

Title: Hypothesis Testing


1
Hypothesis Testing
2
Hypothesis Testing Introduction
Much of what follows has been motivated by
chapters on z-tests and t-tests, and even in the
section on binomial distributions. The problem
weve seen over and over goes something like this
The mean diastolic blood pressure of a group of
42 local junior high school gym teachers is
observed to be 110 mm Hg. However, we know that
according to the United Nations Task force on
World Wide Gym Teachers Health, junior high
school gym teachers around the globe tend to have
DBP which is normally distributed, with mean 120
mm Hg and standard deviation of 8 mm Hg. Does
the sample of Union County gym teachers have a
significantly lower blood pressure than the
worldwide average among Jr. HS gym teachers?

3
Hypothesis Testing Introduction
Alternatively weve seen something like
You observe 20 cases of the disease stenchus
horrendus at the annual meeting of the American
Mathematical Society, whose attendance this year
is 500. But the incidence of s. horrendus in
the general population is known from previous
large scale studies to be about 1.5. How
likely is it to observe this many (or more) cases
of s. horrendus, assuming its prevalence among
mathematician is the same as that of the general
population? Can we say on the basis of this
information that Mathematicians are more likely
to be stinky than the general public?

4
Hypothesis Testing Introduction
The questions we really are interested in are
these 1. Do the gym teachers from Union
county really have lower blood pressure in
comparison to world-wide levels? 2. Are Math
Professors more likely to be stinky than the
general population? (note we cannot ask
anything about the level of stinkiness among
this group, only the prevalence of stinkiness)
To answer these questions, we have just
computed z-scores (or t-scores, when n is small
and the standard deviation is unknown), and then
compared these with some threshold value, usually
5. We basically are going to do the same thing
here, with some added vocabulary.

5
Hypothesis Testing Introduction
Assume that the sample of 37 gym teachers that
were looking at come from a larger population of
Union County gym teachers, and that their DBP has
some unknown mean and standard deviation mu and
sigma. We can estimate that mean by the sample
mean, Xbar. Were also assume that according to
the UN Study, the DBP of gym teachers worldwide
is mu_0. The question is this Is mu (unknown)
smaller than mu_0? Or is it equal? Pose two
hypotheses
The null hypothesis is always that theres no
difference between your sample and the general
population. The Alternative hypothesis is that
there is. In this case, the alternative is that
the DBP of our sample is smaller than the general
population.

6
Hypothesis Testing Introduction
Either the null hypothesis is actually true, or
it is false. And you do some test and decide
either to accept or reject it. So there are four
possible outcomes 1. you accept H0 and it
turns out to be, in fact true (good!) 2. you
accept H0 and it turns out to be false (oops) 3.
you reject H0 and it turns out to be, in fact
false (Ok, good) 4. You reject H0 and it turns
out to have been true (oops) Type I error
--rejecting the null hypothesis, when it is in
fact true. Type II error -- accepting the null
hypothesis, when it is false.

7
Hypothesis Testing Introduction

The null hypothesis is kind of like the
presumption of innocence. We assume it to be
true unless there is sufficient evidence (beyond
a reasonable doubt?) to make us decide to do
otherwise. The burden of proof is always on the
alternative hypothesiss team. Type I error -
analogous to falsely convicting an innocent
person. Type II error - analogous to letting a
guilty offender go free.

8
Hypothesis Testing Introduction

Significance Level of a test No Name Power
of a test

9
Example Birth weights
Ex 7.2, 7.8 Suppose we want to test the
hypothesis that mothers with low socioeconomic
status (SES) deliver babies whose birthweights
are lower than normal. To test this
hypothesis, a list is obtained of birthweights
from 100 consecutive, full-term , live-born
babies from the maternaty ward in a hospital in a
low SES area. The mean birth weight of this
sample is 115 oz, with a sample standard
deviation of 24 oz. Suppose further that from
large studies the mean birthweight in the US is
120 oz. Develop an appropriate hypothesis, and
test it at the 0.05 significance level. Notice
the hypothesis is posed first. Only then is
the data collected to test the hypothesis.


10
Example Birth weights
Ex 7.2, 7.8, 7.10 Suppose we want to test the
hypothesis that mothers with low socioeconomic
status (SES) deliver babies whose birthweights
are lower than normal. To test this
hypothesis, a list is obtained of birthweights
from 100 consecutive, full-term , live-born
babies from the maternaty ward in a hospital in a
low SES area. The mean birth weight of this
sample is 115 oz, with a sample standard
deviation of 24 oz. Suppose further that from
large studies the mean birthweight in the US is
120 oz. Develop an appropriate hypothesis, and
test it at the 0.05 significance level.

Probably what you would do would be compute a
t-score for your sample mean, and compare it with
a t-score associated with .05 significance. Those
two t-scores are called test statistic and
critical value, respectively.

11
Example Birth weights
Test statistic Critical Value
t-values can be gotten from excel, but not the
way the Rosner claims! TINV(.05,99) give t-score
for two sided not one sided.

Because the test statistic is below the critical
value, we reject the null hypothesis. The shaded
area is called the Rejection region, also
sometimes called the critical region. We now can
say the difference in means is significant at the
0.05 level. This is called the critical-value
method.

12
Example Birth weights
Test statistic p-value 0.02

Rather than just checking whether the test is
significant at the .05 level or at the .01 level
or whatever arbitrary level someone else sets,
can just find what we call the p-value yourself.
Thats the area of the shaded region. Stated
another way, its the probability of getting a
value as extreme as the test statistic. Now can
say HOW significant the result is.

13
Guidelines for p-values
Typically, we say that
14
Another example of a one-tailed test
Ex 7.13 Cardiology A topic of recent clinical
interest is the possibility of using drugs to
reduce infarct size in patients who have had a
myocardial infarction within the past 24 hours.
Suppose we know that in untreated pateints the
mean infarct size is 25( ck g EQ/m2).
Furthermore in 8 patients treated with a drug the
mean infarct size is 16 with a standard deviation
of 10. Is the drug effective in reducing infarct
size?
15
Another example of a one-tailed test
Ex 7.1 and 7.18 Cardiovascular Disease A current
area of research interest is the familial
aggregation of cardiovascular risk factors in
general and lipid levels in particular. Suppose
the average cholesterol level in children is
175 mg/dL. A group of men who have died from
heart disease within the past year are
identified, and the cholesterol levels of their
offspring are measured. What hypotheses do you
pose, in order to test whether children of
heart-attack fathers have high cholesterol?
16
Another example of a one-tailed test
Ex 7.1 and 7.18 Cardiovascular Disease A current
area of research interest is the familial
aggregation of cardiovascular risk factors in
general and lipid levels in particular. Suppose
the average cholesterol level in children is
175 mg/dL. A group of men who have died from
heart disease within the past year are
identified, and the cholesterol levels of their
offspring are measured. What hypotheses do you
pose, in order to test whether children of
heart-attack fathers have high cholesterol? Suppo
se that the mean cholesterol level of 10 children
whose fathers died from heart disease is 200
mg/dL and the standard deviation is 50 mg/dL.
Test the hypothesis that the mean cholesterol
level is higher for this group than the general
population.
17
Two tailed test
The earlier one tailed tests all relied on the
fact that you knew (or suspected) that the sample
mean was larger (or smaller) than the population.
Sometimes you go into the study having no idea.
Remember that people form a hypothesis first,
then collect data (or at least thats what the
scientific method that you learned in 7th grade
says youre supposed to do). In that case, the
alternative hypothesis is only that mu is not
equal to mu_0. The null hypothesis remains the
same.
18
Example of a two tailed test
Example 7.20 Cardiovascular Disease Suppose we
want to compare fasting serum-cholesterol levels
among recent Asian immigrants to the US with
typical levels found in the general US
population. Suppose we assume cholesterol levels
in women aged 21-40 in US are approximately
normally with mean 190 mg/dL. It is unknown
whether cholesterol levels among recent Asian
immigrants are higher or lower than the general
population. Assume that the cholesterol levels
of our sample are also normally distributed, with
unknown mu. Pose the null and alternative
hypotheses.
19
Example of a two tailed test
Example 7.20 Cardiovascular Disease Suppose we
want to compare fasting serum-cholesterol levels
among recent Asian immigrants to the US with
typical levels found in the general US
population. Suppose we assume cholesterol levels
in women aged 21- 40 in US are approximately
normally with mean 190 mg/dL. It is unknown
whether cholesterol levels among recent Asian
immigrants are higher or lower than the general
population. Assume that the cholesterol levels
of our sample are also normally distributed, with
unknown mu. Pose the null and alternative
hypotheses. Blood tests are performed on 100
Asian immigrant women aged 21- 40, and the mean
level (Xbar) is 181.52 mg/dL with standard
deviation 40 mg/dL. What can be concluded?
20
Example of a two tailed test
Example 7.20 Cardiovascular Disease mu_0
190 Xbar 181.52 s 40 n 100
21
Power of a test
Weve tried pretty hard to avoid making Type I
errors how? By demanding high level of
significance, which makes it difficult to reject
the null hypothesis, unless were really
sure. What about Type II errors? We would like
to avoid making them as well. How do we go about
doing that? What is a Type I error? What is a
Type II error? Ex You want to test the same
old hypothesis that women of low socioeconomic
status have children with low birthweight. What
would be a Type I error, what would be a Type II
error?
22
Power of a test
Know that if, for instance the sample size is
small, then the standard error is large, so the
critical values for the rejection region go way
out to the left and right, which means its
unlikely that the null hypothesis is ever
rejected, even if there actually is a difference!
The way we measure this is called the Power of
a Test, given by 1- beta. Whats it mean,
intuitively? A powerful test is one which has a
high probability of detecting significant
differences, if such a difference actually
exists. A not-so powerful test is one which is
unlikely to detect significant differences, even
the difference is real. Whats it mean
mathematically?
23
Power of a test
Whats it mean mathematically?
24
Power of a test
Whats it mean mathematically?
25
Power of a test
Example 7.26 What is the power of the test for
the birthweight example from example 7.2, using
an alternative mean of 115 oz and alpha 0.05,
assuming a true standard deviation of 24 oz.
26
Power of a test
Example 7.26 What is the power of the test for
the birthweight example from example 7.2, using
an alternative mean of 115 oz and alpha 0.05,
assuming a true standard deviation of 24 oz.
Recall n 100. What does this mean? Example
7.27 How does this change if n 10? What does
this mean?
27
Power of a test
  • If the significance level is made smaller (alpha
    decreases), then the critical values become
    smaller, and hence the power decreases.
  • If the alternative mean is shifted further away
    from the null mean, then the power increases.
  • If the standard deviation of the distribution of
    individual observations increases, then the power
    decreases.
  • If the sample size increases, then the power
    increases.

28
More Examples
Exercises 7.7, 7.8, 7.9 Plasma-glucose levels are
used to determine the presence of diabetes.
Suppose the mean ln(plasma-glucose) concentration
(mg/dL) in 35-44 year olds is 4.86 with standard
deviation 0.54. A study of 100 sedentary people
in this age group is planned to test if they have
higher or lower level of plasma glucose than the
general population. 7.7 If the expected
difference is .10 ln units, then what is the
power of such a study if a two-sided test is to
be used with alpha 0.05? 7.8 What if the
expected difference is 0.20 ln units? 7.9 How
many people must be studied to have 80 power
under the assumptions of 7.7?
29
Sample Size Determination
Type I error light blue, alpha Type II error
light green, beta relative to each mean, the
blue area is determined by z_alpha or by
z_1-beta
30
Sample Size Determination
Example birthweights again Consider the
birthweight data from example 7.2. mu_0 120
oz, mu_1 115, sigma 24, alpha 0.05, and
wed like 1-beta .80. This is a one-sided
test. How big a sample must you take?
31
One Sample Test Binomial Distribution
Recall that if p, q are moderately sized, and n
large enough, then a binomial distribution tends
to look like a normal distribution, with mean mu
np and variance npq. Then we can use the same
kind of methods to test hypotheses about the true
proportion of a population having some
trait. Now the test-statistic will be given
by These will typically be two-sided tests.
32
Example Binomial Distribution
Example 7.47 Cancer Consider the breast cancer
data from example 6.48. In that example, we were
interested in the effect of having a family
history of breast cancer on the incidence of
breast cancer. Suppose that out of 10,000 50-54
year old women sampled whose mothers had breast
cancer, 400 had breast cancer at one time in
their lives. Given large studies, assume the
prevalence rate of breast cancer in the US is 2
. Restate this in terms of hypothesis
testing, where p (unknown) is the true prevalence
of breast cancer among women whose mothers had
breast cancer.
33
Example Binomial Distribution
Recall that with problems about the mean, the
power of the test was given by The first
term comes from the significance level. The
second part just says how far apart the
population mean is from the alternative mean. We
can do something similar with the binomial
distribution (which is a two sided test)
Write a Comment
User Comments (0)
About PowerShow.com