Title: Hypothesis Testing
1Hypothesis Testing
- In statistics a hypothesis is a claim about a
property of a population - A common guideline for statistical reasoning is
- Analyze a sample in an attempt to distinguish
between results that can easily occur and results
that are highly unlikely - Statistical hypothesis testing is a
decision-making process for evaluating claims
about a population.
2Objectives
- Understand the definitions used in hypothesis
testing. - State the null and alternative hypotheses.
- Find critical values for the z test.
- State the five steps used in hypothesis testing.
- Test means for large samples using the z test.
- Test means for small samples using the t test.
3Objectives (contd.)
- Test proportions using the z test.
- Test variances or standard deviations using the
chi square test. - Test hypotheses using confidence intervals.
- Explain the relationship between type I and type
II errors and the power of a test.
4Introduction
- In hypothesis testing, the researcher must
- define the population under study,
- state the particular hypotheses that will be
investigated, - give the significance level,
- select a sample from the population,
- collect the data,
- perform the calculations required for the
statistical test, - reach a conclusion.
5Methods to Test Hypotheses
- The three methods used to test hypotheses are
- 1. The traditional method.
- 2. The P-value method.
- 3. The confidence interval method.
6Statement of a Hypothesis
- A statistical hypothesis is a conjecture about a
population parameter which may or may not be
true. - There are two types of statistical hypotheses for
each situation the null hypothesis and the
alternative hypothesis.
7The Null Hypothesis H0
- The null hypothesis, symbolized by H0, is a
statistical hypothesis that states that there is
no difference between a parameter and a specific
value, or that there is no difference between two
parameters. - The null hypothesis must contain the condition of
equality - written with the symbols , ?, ?
- stated in three possible forms
8The Alternative Hypothesis H1
- The alternative hypothesis, symbolized by H1, is
a statistical hypothesis that states the
existence of a difference between a parameter and
a specific value, or states that there is a
difference between two parameters. - The alternative hypothesis is a statement that
must be true if the null hypothesis is false - The alternative hypothesis may be written in
three possible forms - The alternative hypothesis determines whether the
statistical test is a one-tailed test or
two-tailed test
9The Hypotheses Form a Logical Pair
- The null hypothesis must contain the condition of
equality - written with the symbols , ?, ?
- The alternative hypothesis is a statement that
must be true if the null hypothesis is false - written with the symbols ?, lt, gt
10Design of the Study
- After stating the hypotheses, the researchers
next step is to design the study. - The researcher selects the correct statistical
test, chooses an appropriate level of
significance, and formulates a plan for
conducting the study.
11Statistical Test
- A statistical test uses the data obtained from a
sample to make a decision about whether or not
the null hypothesis should be rejected. - The numerical value obtained from a statistical
test is called the test value.
12Levels Of Significance
- The probability of rejecting the null hypothesis
when it is true is called the significance level - denoted with the Greek letter ?
- ? choices are typically ?.05, ? .01 or ?
.10 - ? is just the area in the tails of the
distribution - P(type I error) ?
- The probability of a type II error is ?
13Critical Region
- The set of all values of the test statistic that
would cause us to reject the null hypothesis - The critical or rejection region is the range of
values that indicates a significant difference
between the sample data and the null hypothesis
parameter - The remaining region is the non-critical region
which indicates a difference due to chance- we
fail to reject the null hypothesis
14Controlling Type I and Type II Errors
- Mathematically it can be shown that ?, ? and
sample size n are related - For a fixed ?, an increase in sample size will
cause a decrease in ? - For a fixed n, a decrease in ? causes an
increase in ? - To decrease both ? and ? increase sample size
- Thus ? and ? are related in that decreasing one
increases the other.
15Setting Significance Levels
- Consider a package of M Ms
- contains 1498 candies
- package weight labeled as 1361 g ? .9085g/candy
- Consider a package of Bufferin
- contains 30 tablets
- 325 mg/ tablet
- What are the consequences if the MMs dont have
a mean population weight of .9085g? - What are the consequences if the Bufferin tablets
have too much aspirin?
16Setting Significance Levels
- What are the consequences if the MMs dont have
a mean population weight of .9085g? - not critical to test the claim that ? .9085g
- we choose n 100 , ? .05
- What are the consequences if the Bufferin tablets
have too much aspirin? - more critical, choose n500 and ? .01
17Conclusions in Hypothesis Testing
- The initial conclusion will always be one of the
following - 1. Fail to reject the null hypothesis H0
- 2. Reject the null hypothesis
- Wording is very important
- Notice that we are never proving the null
hypothesis -
18Tailed Tests
- Tails in a distribution are the extreme regions
bounded by critical values - Two-tailed test used when H1 ?
- A one-tailed test is either right-tailed or
left-tailed, depending on the direction of the
inequality of H1
19Hypothesis-Testing (Traditional Method)
- Step 1 State the hypothesis, and identify the
claim. - Step 2 Find the critical value from the
appropriate table. - Step 3 Compute the test value.
- Step 4 Make the decision to reject or not
reject the null hypothesis. - Step 5 Summarize the results with appropriate
wording
20Testing Claims About the Mean of a Population
- Assumptions for the z -test
- 1. Sample size is large (n ? 30)
- When applying the Central Limit Theorem use ?,
or use the sample standard deviation , s, as an
estimate of ? - 2. If the sample size is small then the parent
population must be normally distributed, and ?
must be known
21The z Test Formula
- The z test is a statistical test for the mean of
a population
22The z Test When ? is Unknown
- The central limit theorem states that when the
population standard deviation ? is unknown, the
sample standard deviation s can be used in the
formula as long as the sample size is 30 or more.
23The P-value
- The P-value (or probability value) is the
probability of getting a sample statistic that
is at least as extreme as the one found from the
sample data, assuming that the null hypothesis is
true. - When using a P-value
- Reject the null hypothesis H0 if the P-value ? ?
- Fail to reject the null hypothesis if the
P-value gt ? - Or simply report the P-value
24The P-value (contd.)
- The P-value is the actual area under the standard
normal distribution curve (or other curve
depending on what statistical test is being used)
representing the probability of a particular
sample statistic or a more extreme sample
statistic occurring if the null hypothesis is
true.
25Hypothesis-Testing (P-Value Method)
- Step 1 State the hypothesis, and identify the
claim. - Step 2 Compute the test value.
- Step 3 Find the P value from the appropriate
table. - Step 4 Make the decision to reject or not
reject the null hypothesis. - Step 5 Summarize the results with appropriate
wording
26Statistical vs. Practical Significance
- The researcher should distinguish between
statistical significance and practical
significance. - When the null hypothesis is rejected at a
specific significance level, it can be concluded
that the difference is probably not due to chance
and thus is statistically significant. However,
the results may or may not have any practical
significance.
27Interpretations of P-Values
- Interpretation
- Highly statistically significant Very strong
evidence against the null hypothesis - Statistically significant Adequate evidence
against the null hypothesis - Insufficient evidence against the null hypothesis
- P -Value
- Less than 0.01
- 0.01 to 0.05
- Greater than 0.05
28Confidence Intervals Hypothesis Testing
- There is a relationship between confidence
intervals and hypothesis testing. - When the null hypothesis is rejected in a
hypothesis testing situation, the confidence
interval for the mean using the same level of
significance will not contain the hypothesized
mean. - Likewise, when the null hypothesis is not
rejected, the confidence interval computed will
contain the hypothesized mean.
29The t Test
- The t test is a statistical test of the mean of a
population and is used when the population is
normally or approximately normally distributed, ?
is unknown, and the sample size is less than 30. - The formula for the t test is
- The degrees of freedom are d.f. n1.
30z Test for a Proportion
- A hypothesis test involving a population
proportion can be considered as a binomial
experiment when - there are only two outcomes and
- the probability of a success does not change
from trial to trial.
31Binomial Experiment
- A probability experiment with
- Each trial has only two outcomes
we consider the outcomes as
success (yes) or failure (no) - There are a fixed number of trials
- The outcomes of each trial are independent
- The probability of success remains the same for
each trial
32Examples of Proportions Viewed as Binomial
Experiments
- The percentage of late-night viewers who watch
The Late Show with David Letterman is equal to
18 - Based on a sample survey, fewer than 1/4 of all
college graduates smoke. - If a fatal car crash occurs, there is a .44
probability that it involves a driver who has
been drinking
33Assumptions Used When Testing a Claim About a
Population Proportion
1. The conditions for a binomial experiment are
met 2. The conditions np? 5 and nq ? 5 are
both satisfied so the binomial
distribution of sample proportions can be
approximated by a normal distribution with
? np
34Formula for the z Test for Proportions
35Assumptions for Chi-Square Test for Single
Variance
- 1. The sample must be randomly selected from the
population. - 2. The population must be normally distributed
for the variable under study. - 3. The observations must be independent of each
other.
36Robustness of Inferences
- Tests for claims about standard deviations are
not as robust as other test claims
the inferences can be
very misleading if the population does not have a
normal distribution - In this section, the condition of a normally
distributed population is a much stricter
requirement - Compare this to the student t distribution
where we required the population to be
approximately normal
37Robust Inferences
- For the student t distribution where we
required the population to be approximately
normal - We say that inferences for the mean are fairly
robust non-extreme departures from normality
still lead to reasonable conclusions
38Point Estimates for Variance and Standard
Deviation
- The sample variance s2 is the best point estimate
of the population variance - However, s is not the best point estimate of the
standard deviation of the population UNLESS THE
SAMPLE SIZE IS LARGE
39The 3 Properties of the Chi-Square Distribution
- Not symmetric
- Non-negative
- Shape depends on degrees of freedom
- d.f. n - 1
- as n increases the shape approaches a normal
distribution
40Chi-Square Test for Single Variance
41Summary
- A statistical hypothesis is a conjecture about a
population. - There are two types of statistical hypotheses
the null hypothesis states that there is no
difference, and the alternative hypothesis
specifies a difference.
42Summary (contd.)
- The z test is used when the population standard
deviation is known and the variable is normally
distributed or when ? is not known and the sample
size is greater than or equal to 30. - When the population standard deviation is not
known and the variable is normally distributed,
the sample standard deviation is used, but a t
test should be conducted if the sample size is
less than 30.
43 Summary (contd.)
- Researchers compute a test value from the sample
data in order to decide whether the null
hypothesis should or should not be rejected. - Statistical tests can be one-tailed or
two-tailed, depending on the hypotheses.
44Summary (contd.)
- The null hypothesis is rejected when the
difference between the population parameter and
the sample statistic is said to be significant. - The difference is significant when the test value
falls in the critical region of the distribution.
- The critical region is determined by ?, the level
of significance of the test.
45Summary (contd.)
- The significance level of a test is the
probability of committing a type I error. - A type I error occurs when the null hypothesis is
rejected when it is true. - The type II error can occur when the null
hypothesis is not rejected when it is false. - One can test a single variance by using a
chi-square test.
46Conclusions
- Researchers are interested in answering many
types of questions. For example - Will a new drug lower blood pressure?
- Will seat belts reduce the severity of injuries
caused by accidents? - These types of questions can be addressed through
statistical hypothesis testing, which is a
decision-making process for evaluating claims
about a population.