Title: Math 3680
1Math 3680 Lecture 7 The Sign Test and
the Binomial Exact Test
2The Sign Test
3- Example For this data set, there are 14 pairs in
which there is a difference in the two measured
amounts. Let K the number of pairs in which
the first method returned a higher amount. We
choose a 0.05. We observe that there are 12
pairs where the first method returns a higher
value. This gives the value of the test statistic
ks 12. - 79.2 74.0 96.8 95.8
- 105.8 97.8 76.0 75.0 99.2 98.0
- 99.5 96.2 69.5 67.5 99.2 99.0
- 100.0 101.8 23.5 21.2 91.0 100.2
- 93.8 88.0 95.2 94.8 72.0 67.5
-
4- Lets play devils advocate and assume the null
hypothesis is correct. That is, lets assume
that p 0.5, and lets work through the logical
ramifications of this assumption. - If the null hypothesis is correct, whats the
probability of obtaining a value of K at least
as extreme as the observed test statistic? This
probability is called the P-value, or the
observed level of significance -
-
Excel 1 - BINOMDIST(11,14,0.5,1)
5- In other words, if we assume the null hypothesis,
we also have to accept that fact that theres
less than 1 chance in 150 of obtaining a test
statistic this large or larger. - We then ask the question which is more
plausible? In this case, the P-value is less than
the stipulated significance level of a 0.05.
Its more plausible to reject the null hypothesis
in favor of the alternative hypothesis. - Conclusion (written in plain English)
- We reject the null hypothesis. There is good
reason to believe that the first method returns a
higher amount than the second method. -
-
6- Summary of algorithm for hypothesis testing
- H0 The first method does not return higher
values than the second method (p 0.5) - Ha The first method returns higher values than
the second method (p gt 0.5) (one-sided test) - We choose a 0.05
- Test statistic ks 12 for a sample size of n
14 - P-value 0.00646973
- Conclusion We reject the null hypothesis. There
is good reason to believe that the first method
returns a higher amount than the second method. -
-
7- Notes.
- Note 1. Notice we have not proven beyond a
shadow of a doubt that the first method returns a
higher value than the second method. Is it
possible for 14 fair coins to land so that 12 or
more are heads? Yes. In other words, we may
simply have had a run of luck. - However, we can reasonably justify our rejection
of the null hypothesis.
8- Note 2. The alternative hypothesis is p gt 0.5.
It is not that p 12/14. Good practice is to
state the null and alternative hypotheses (and
select a ) before looking at the data. - Note 3. Small P-values are evidence against the
null hypothesis they indicate that something
besides chance is at work.
9- Note 4. If P lt 5, the result is often called
statistically significant. -
- If P lt 1, the result is called highly
statistically significant. - These phrases are often used in media reports on
scientific progress especially breakthroughs in
medical research.
10After dropping for years, teen smoking in the
U.S. has leveled off Monday, June 12, 2006
Posted 1014 a.m. EDT (1414 GMT) ATLANTA,
Georgia (AP) -- The long, steady decline in teen
smoking in the United States since the late
1990s appears to have come to a standstill,
health officials said Friday. A survey released
this week showed that smoking among high school
students held steady at around one in four
teenagers between 2003 and 2005. Two other
surveys in the past year or so found that teen
smoking has apparently plateaued since 2002. "We
were making good progress, and now it looks like
we're not," said Dr. Corinne Husten, acting
director of the Office on Smoking and Health at
the Centers for Disease Control and
Prevention. The trend was outlined in the CDC's
National Youth Risk Behavior Survey, which is
conducted every other year and involves about
14,000 high school students across the country.
The results of the latest survey were released
last week. The survey had been showing a steady
and pronounced decline in youth smoking since
1997, when more than 36 percent of students said
they had smoked in the previous 30 days. The
percentages dropped to about 35 in 1999, 28.5 in
2001 and 22 in 2003. But when students were
asked the question last spring, 23 percent said
they had smoked. The increase from the 2003
survey was not considered statistically
significant, but it was disturbing news, health
advocates said.
11 Study shows fliers are out of
breath Wednesday, April 27, 2005 Posted 723 AM
EDT (1123 GMT) LONDON, England -- Airline
passengers are putting up with "significant"
drops in the supply of oxygen while flying at
high altitude, according to researchers. Just
over half of all fliers analyzed had oxygen
levels 6 percent lower than usual when the
airplane was at maximum altitude -- a level at
which doctors normally administer extra oxygen
for hospital patients. "We believe that these
falling oxygen levels, together with factors such
as dehydration, immobility and low humidity,
could contribute to illness during and after
flights," said Susan Humphreys of the Royal
Group of Hospitals in Belfast, whose group
conducted the research. "This has become a
greater problem in recent years as modern
airplanes are able to cruise at much higher
altitudes." A drop in oxygen levels can be a
contributing factor to deep vein thrombosis
(DVT), a potentially fatal blood clot which is
also called "economy class syndrome." Low oxygen
levels also can lead to headaches, fatigue and
impaired mental performance.
12"We should be giving people with ill health more
advice about things they can do, such as
drinking more water when they fly, to avoid
problems," researcher Rachel Deyermond told the
UK's Daily Telegraph newspaper. The researchers
from Belfast, Northern Ireland published their
results in the May issue of Anaesthesia, a
British medical journal. They recorded the blood
oxygen levels and the pulse rate of 84
passengers, aged 1 to 78, at both ground level
and at peak altitude during a flight. The
research shows a "statistically significant"
reduction in oxygen levels in all passengers
traveling on both long- and short-haul
flights. On average, oxygen levels in passengers
dropped by 4 percent by the time the plane had
reached cruising altitude. A total of 54 percent
of passengers had oxygen levels below this
level. Of the 84 passengers who were analyzed,
55 were on flights lasting more than two hours,
while the rest were on short-haul journeys.
Similar results were obtained from both groups.
None of them had severe cardio-respiratory
problems or required permission from their
doctor to fly.
13(No Transcript)
14- Note 5. We are NOT saying that there is 1 chance
in 150 for the null hypothesis to be correct. - Instead, the P-value is used as a tool to
determine whether or not to reject the null
hypothesis. - Note 6. The significance level a should be
chosen before inspecting the data. Seeing the
evidence before deciding on the value of a is
called data snooping, which may bias our decision.
15- Note 7. When computing the P-value, we found
P(K 12) and not P(K 12). The idea is
that, assuming the null hypothesis is true, we
want to compute the probability of getting an
observed value either this extreme or even more
extreme. - Why does this makes sense? Suppose a fair coin is
flipped 1000 times and lands heads 501 times. We
should retain the null hypothesis, and the chance
of getting 501 or more heads is quite large
(48.7). However, the chance of getting exactly
501 heads is very small (2.5) using the latter
figure would have led us to incorrectly reject
the null hypothesis.
16- Example Ten children (ages 8 to 14) with a
history of severe learning and behavioral
disorders were recruited for a six-week study.
For three weeks, each child was given a placebo
for the other three weeks, each child was given
ethosuximide, widely prescribed for epilepsy.
Five of the children received the placebo first
the other five received the placebo last. - After each three-week period, each child was
given an IQ test. The table (P/E) shows the two
verbal IQ scores for each child. Was the
medication effective for increasing IQ scores? - 97 113 102 111 104 106
- 106 113 111 122 90 110
- 106 101 115 121 96 126
- 95 119
17- Solution.
- H0 The IQ scores after ethosuximide were the
same as the scores after placebo (p 0.5) - Ha The IQ scores after ethosuximide were
different than the scores after placebo (p ? 0.5)
(two-sided test) - We choose a 0.05
- Before continuing, why isnt Ha written as p gt
0.5? -
18- Test statistic ks 9 for a sample size of n
10. - P-value. Assuming H0, we must find the chance of
obtaining a test statistic at least this extreme.
For this problem, that means (why?) -
-
In Excel BINOMDIST(1,10,0.5,1) 1 -
BINOMDIST(8,10,0.5,1)
19- Conclusion We reject the null hypothesis. There
is good reason to believe that the ethosuximide
does effect verbal IQ scores. -
-
20- Notes
- Note 8. The form of the alternative hypothesis,
which is based on the context of the problem,
determines how the P-value is computed.
21- Secondhand smoke is classified as a known
carcinogen by the Environmental Protection Agency
(EPA). This classification is based on many
scientific studies which investigated the
question of whether secondhand smoke was
associated with a higher incidence of cancer. - The EPA conducted its study using a 5
significance level and a one-tailed test. A
one-tailed test was used because it was already
independently determined that first-hand smoke
caused cancer and the preliminary studies
indicated that second-hand smoke was a probable
cause of cancer. However, the tobacco industry
argued that a one-tailed test was inappropriate
and that a two-tailed test should be used. They
claimed that by using a one-tailed test at the 5
significance level, the EPA was essentially using
a two-tailed test at the 10 significance level,
since each tail would then have area of 5. The
tobacco industry argued that this doubled the
probability of a type I error. - Nevertheless, since there was good reason to
think that secondhand smoke was a carcinogen, the
EPA followed the usual scientific convention of
using a one-tailed test. Reference Secondhand
Smoke Is it a Hazard?, Consumer Reports,
January 1995 -
22Testing a Population Median
23- The sign test may also be used as a test for the
value of a population median. Recall the
definition of a median half the data should lie
below the median, while the other half lies
above.
24- Example A bank will open a new branch in a
community only if it can be established that the
median family income in the community is greater
than 50,000. To obtain information, a random
sample of 75 families is chosen. Of these, 44 had
incomes over 50,000, while the other 31 had
incomes below 50,000. - Is this information statistically significant to
establish that the median family income is more
than 50,000?
25- Solution.
- H0 The median income is 50,000 (or less) (m ?
50,000) - Ha The median income is more than 50,000 (m gt
50,000) - Alternatively, let p be the probability that a
randomly selected family has an income of less
than 50,000. Then we may write (why?) - H0 p ? 0.5
- Ha p lt 0.5
-
26- We choose a 0.05.
- Test statistic ks 31 for a sample size of n
75. - P-value. Assuming H0, we must find the chance of
obtaining a test statistic at least this extreme.
For this problem, that means (why?) -
-
In Excel BINOMDIST(31, 75, 0.5, 1)
27- Conclusion We fail to reject the null
hypothesis. There is not enough evidence to think
that the median family income is more than
50,000. -
- Notice why the phrase fail to reject is
important. With a larger sample, its conceivable
that the null hypothesis would then be rejected. -
28Conceptual Questions
- 1) True or False
-
- a) The observed significance level of 8 depends
on the data (i.e. sample) - b) There are 92 chances out of 100 for the
alternative hypothesis to be correct.
29Conceptual Questions
- 2) True or False
-
- a) A highly statistically significant result
cannot possibly be due to chance. - b) If a sample difference is highly
statistically significant, there is less than a
1 chance for the null hypothesis to be correct.
30Conceptual Questions
- 3) True or False
-
- a) If P 43, then the null hypothesis looks
plausible. - b) If P 0.43, then the null hypothesis looks
implausible.
31Binomial Exact Test
32- Example A die is rolled 180 times it lands six
45 times. Is this evidence statistically
significant enough to conclude that the die is
not fairly balanced? - Solution.
- H0
- Ha
- We choose a 0.05.
- The test statistic is ks 45 for a sample
size of n 180.
33- P-value. Assuming H0, we must find the chance of
obtaining a test statistic at least this extreme.
For this problem, that means that (why?) - Excel
- BINOMDIST(15,180,1/6,1)1 - BINOMDIST(44,180,1/6,
1) - Conclusion
34- Example There is a social theory that states
that people tend to postpone their deaths until
after some meaningful event birthdays,
anniversaries, the World Series. - In 1978, social scientists investigated
obituaries that appeared in a Salt Lake City
newspaper. Among the 747 obituaries examined, 60
of the deaths occurred in the three-month period
preceding their birth month. However, if the day
of death is independent of birthday, we would
expect that 25 of these deaths (about 187) would
occur in this three-month period. - Does this study provide statistically significant
evidence to support this theory?
35(No Transcript)
36- Example The following table summarizes the
findings of a 1971 observational study of 5466
women who gave birth, categorized by both smoking
preference and low birthweight - Low birthweight Normal Total
- Smokers 185 1891 2076
- Nonsmokers 193 3197 3390
- Total 378 5088 5466
- Does this show that smoking is associated with
low birthweight? (Notice we dont say causes
since this is not a randomized, controlled,
double-blind experiment.)
37- Example The following table summarizes the
findings of a 1971 observational study of 5466
women who gave birth, categorized by both smoking
preference and low birthweight - Low birthweight Normal Total
- Smokers 185 1891 2076
- Nonsmokers 193 3197 3390
- Total 378 5088 5466
- Method of Attack Suppose that smoking and low
birthweight are not associated. Then we would
expect the proportion of smoking mothers among
the 378 low birthweight babies to be roughly the
same as the proportion of smoking mothers of all
5466.
38- Solution.
- H0
- Ha
- We choose a 0.05.
- The test statistic is ks 185 for a sample
size of n 378 (roughly 49).
39- P-value. Assuming H0, we must find the chance of
obtaining a test statistic at least this extreme.
For this problem, that means - Conclusion
40Data Call Into Question HIV Study Results By
Gautam Naik and Mark Schoofs The Wall Street
Journal, October 10, 2009 Researchers from
the U.S. Army and Thailand announced last month
they had found the first vaccine that provided
some protection against HIV. But a second
analysis of the 105 million study suggests the
results may have been a fluke The second
analysis, which is considered a vital component
of any vaccine study, shows the results weren't
statistically significant, these scientists
said. In other words, it indicates that the
results could have been due to chance and that
the vaccine may not be effective The
incomplete disclosure raises the question of
whether the Army, the Thai government and the
U.S. National Institutes of Health -- which
helped fund the study -- rushed to give a
positive spin to what may turn out to be another
inconclusive AIDS-vaccine effort.
41 The first analysis announced last month
were based on a "modified-intent-to-treat
analysis," which includes virtually everyone who
enrolled in the study, regardless of whether they
ended up getting the full course of the vaccine.
It is a good stand-in for the real world, where
people don't always follow instructions properly.
By this measure, the vaccine tested in Thailand
reduced by 31 the chance of infection with HIV.
42But the result was derived from a small number of
actual HIV cases New infections occurred in 51
of the 8,197 people who got the vaccine, compared
with 74 of the 8,198 volunteers who got placebo
shots. NB 51 is about a 31 reduction from
74. Statistical calculations showed there was
a 3.9 probability that chance accounted for the
difference. In drug and vaccine trials, anything
above a 5 probability of a chance result is
deemed statistically insignificant.
43 Infections No infections Total Treatment
51
8146 8197 Control
74 8124
8198 Total
125 16,270
16,395
44(No Transcript)
45 The second analysis is called "per protocol"
and adheres strictly to how the trial was
designed by only including the study participants
who got the full regimen of vaccine shots at the
right time. Because it excludes study
participants who didn't get the full vaccine
regimen, it usually provides corroboration to the
looser "intent to treat" findings. Two AIDS
scientists, who have seen the "per protocol"
analysis, said it indicates there is a 16
chance the study results were a fluke -- a far
greater probability than is considered
statistically acceptable. This analysis included
86 people who received either the vaccine or a
placebo and were infected. The "per protocol"
analysis also showed that the supposed
effectiveness was lower, at 26.2. Dr. Kim, of
the U.S. Army, declined to comment on the data.
It isn't clear why the vaccine was seemingly
ineffective among participants who followed the
guidelines to the letter.