Hypothesis Testing - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Hypothesis Testing

Description:

9. Significance Level ... The value of a is called the significance level of the test. 8/15/09 ... Avoid this by using a significance level of a/k when doing k tests ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 52

Provided by: SteveF6

Category:

more less

Transcript and Presenter's Notes

Title: Hypothesis Testing

1
Hypothesis Testing

Confidence intervals establish the value of a
parameter (population mean or proportion) based
on a sample or the difference between parameters
based on two samples
Hypothesis testing asks whether the difference
between a sample and a population, or between two
samples, is due to chance (i.e., how likely the
difference is, if its due to chance)
Two different ways of looking at same data
Math is very similar compute SE, appropriate
value of Z or t, and corresponding value of p

2
Hypothesis Testing

When comparing a sample mean to a population
mean, we hypothesize that the sample was randomly
drawn from the given population, and that any
difference we observe is due to chance.
When comparing two sample means, we hypothesize
that both samples were randomly drawn from the
same population, and that any difference we
observe is due to chance.
Based on this null hypothesis (i.e., that the
observed differences are due to chance, we
calculate the probability of observing a larger
difference

3
Testing the Null Hypothesis

Assume that any differences we observe between
the means are due to chance
Calculate on this basis the probability of a
difference larger than that observed
If the probability is below some threshold (e.g.,
5), reject the null hypothesis and conclude
that the difference is statistically
significant (i.e., probably not due to chance)
If the probability is above the threshold, fail
to reject the null hypothesis and conclude that
the observed difference could be due to chance

4
Alternative Hypotheses

The alternative hypothesis is our theory of why
the observed difference is meaningful and not
simply the result of chance
The alternative hypothesis is usually the theory
that we are attempting to gather evidence for. It
is sometimes called the research hypothesis
Often we dont have a specific alternative
hypothesis (i.e., how big the difference should
be) we want to know if the difference is real
The null and alternative hypotheses are labeled
Ho and HA

5
One-tailed v. Two-tailed Tests

HA can be either one-tailed or two-tailed
In a one-tailed hypothesis, only differences in
one direction can lead to rejection of H0
In a two-tailed test, results in either direction
can lead to rejection of H0
One-tailed HA use gt or lt two-tailed HA use
?
Should HA be one- or two-tailed? It depends on
the problem and on what we are trying to prove.
Decision should not be based on the sample data!
When in doubt, use two-tailed (harder to reject
H0)

6
Examples of H0 and HA
7
Types of Errors

Either decisionto accept the null hypothesis or
to reject itmight be incorrect.
We might reject the null hypothesis when it is
true if our sample is unlucky and the observed
difference is large simply by chance. This is
called a type I error or a false positive.
We might accept the null hypothesis when it is
false if the true difference is small or if the
sample is not large enough to detect it. This is
a type II error or a false negative.

8
Type-I and Type-II Errors

Type-I errors usually are considered more serious
than type-II errors
You choose the probability of a type-I error by
choosing the threshold or significance level
for rejecting the null hypothesis
Decreasing probability of type-I error increases
probability of type-II error, and vice-versa

9
Significance Level

The real question is how strong the evidence must
be to reject the null hypothesis.
The analyst determines the probability of a
type-I error that he is willing to tolerate. The
value is denoted by a and is most commonly equal
to 0.05, although a 0.01 and a 0.1 are also
frequently used.
The value of a is called the significance level
of the test.

10
Type-I and Type-II Errors
You choose a decreasing a increases b. Often b
is not known b also depends on the size of the
true difference and the size of the sample.
11
Sometimes type-II errors are more costly
In this case, you want to choose a very high
value of a, because you want to minimize type-II
errors
12
Significance from Rejection Region

Construct confidence interval for parameter based
on a confidence level 1 a
For a one-tailed test, a is the probability in
the right-hand tail (if HA lt) or the left-hand
tail (if HA lt)
For a two-tailed test, a/2 is the probability in
each tail
If sample mean or sample proportion is outside
the confidence interval (i.e., in the rejection
region) then reject the null hypothesis at the
a significance level
Sample evidence that falls in the rejection
region is called statistically significant at
the a level

13
Significance from p-values

The p-value is the probability of seeing a sample
at least as extreme as the observed sample, given
that the null hypothesis is true
If p lt a, reject the null hypothesis
Smaller values of p indicate more evidence in
support of the alternative hypothesis
If p is sufficiently smallif the observed
difference is highly unlikely to have occurred by
chancealmost anyone would reject the null
hypothesis

14
Significance from p-values

How small is a small p-value? It depends on the
problem, and on the consequences and relative
costs of type-I and type-II errors.
if p lt 0.01, there is convincing evidence
against H0. Only 1 chance in 100 of p lt 0.01 if
H0 is true. Unless the consequences of a type-I
are very serious, reject H0.
if 0.01 lt p lt 0.05, there is strong evidence
against H0 (and in favor of HA)
if p gt 0.10, little or no evidence in support of
the alternative hypothesis.

15
Multiple Comparisons

The preceding guidelines are for a single
hypothesis test using a particular sample
If we do a large number of hypothesis tests, the
likelihood of a type-I error will increase
If we do 100 tests with a 0.05, we will (on
average) commit 5 type-I errors if H0 was true in
most cases
Avoid this by using a significance level of a/k
when doing k tests
100 tests with a 0.0005 gives less than 5
chance of making a single type-I error overall

16
Practical v. Statistical Significance

Statistically significant means that a
difference is discernable, not necessarily that
the difference is importance
The acceptance rate for male undergraduates at
the UMCP is 56, compared to 55 for women
Because the sample is so large (21,000) the
difference between the acceptance rates is
statistically significant (p 0.005)
Nevertheless, the difference is so small that it
is of no practical or policy importance

17
One-sample v. Two-sample Tests

One-sample Tests
Compare sample mean to known population mean
are test scores of sample below national average?
Compare sample proportion to population
proportion
is proportion of girls in Choice program
different from proportion in the general
population?
Two-sample Tests
Compare two sample means
are this years test scores higher than last
years?
are test scores of Choice students higher than
MPS?
Compare two sample proportions
is proportion of white students in Choice and MPS
samples different?

18
Computational Methods

Manual
determine , s, and n (or and n) for each
sample using Excel formulas or Pivot tables
calculate t or Z
calculate p value using Excel formulas
TDIST(t,df,tails) or NORMSDIST(Z) or tables
Analyse-It
Data Analysis
Two sample paired or independent

19
Milwaukee Data Set
20
One-sample Test for Population Mean

Is the average reading score of Choice students
below the national average?
H0 mChoice mUS HA mChoice lt mUS
Population mUS 50
Sample

year 91 choice 1
21
p-value is the area under the curveto the left of
22
One-tailed or Two-tailed Test?

In the this example, the alternative hypothesis
was one-tailed
This assumed that if Choice students were
different from the population of US students,
that they would be below average
This is valid if the presumption is based on
other evidence (e.g., low family income or a
long-standing trend) not valid if based on
sample
In this case, a two-tailed test would also be
appropriate

23
Using a Pivot Table
Drop year in page field, choice in column
field, and drop read three times in data field.
c6/SQRT(c7)
c5-c9
TINV(0.05,c7-1)
TDIST(c12,c7-1,2)
24
Using Analyse-It

Sort to isolate data of interest (observations
with year 91 and choice 1)
Select Analyse/Parametric/One Sample t-test
Select variable (read)
Enter population mean for hypothesized mean
(50)
Select two- or one-tailed alternative hypothesis
(? ? 50, ? lt 50, ? gt 50)
Enter desired confidence interval (0.95)
Output on new worksheet

25
A one-sided confidence interval for difference
between 50 and mean of population from which
sample was drawn. Doesnt include 0, so we
conclude difference is real, not due to chance.
26
One-sample Test for Population Proportion

Are girls more or less likely to be in the Choice
program?
H0 pgirl 0.5 (same as general population)
HA pgirl ? 0.5 (different from general
population)
By 1993, 157 of 282 Choice students were girls
A two-tailed test is appropriate, unless there is
some a priori reason (other than the proportion
of girls in the sample 157/282 0.557) to
believe that girls would be over- or
under-represented in the Choice program

27
One-sample Test for Population Proportion
Note use of p in formula for SE. Thats because
were testing null hypothesis. Also note that np
gt 5, n(1-p) gt 5
28
p-value is the shaded area p 0.057
29
Using a Pivot Table
30
Using Analyse-It

Sort to isolate data of interest (year 93,
choice 1)
Select Analyse/Parametric/One-sample z-test
Select variable (e.g., female)
Enter hypothesized mean (e.g., 0.5)
Enter population SD (sqrt(0.50.5) 0.5)
Select one- or two-tailed alternative hypothesis
Enter desired confidence interval (e.g., 0.95)
Output on new worksheet

31
(No Transcript)
32
Continuity Correction

The calculation is more accurate if we calculate
the probability of 156.5 or more girls out of 282
Why? We are approximating binomial (discrete)
distribution with normal (continuous)
distribution.

33
Binomial Distribution n 282, p 0.5
34
Analyse-It can also calculate binomial confidence
intervals and hypothesis tests, but the data must
either be categorical or summarized into a table
(well do this later).
35
Difference in Sample Proportions

Is the proportion of girls in the Choice program
different from the proportion in MPS sample?
H0 pC pM p HA pC ? pM

36
Difference in Population Proportions
37
Why Use Pooled p?

Here we used the pooled p to compute SE
In confidence intervals, we used
We test the null hypothesis if H0 is true, then
samples were drawn from the same population
pooled p is the best estimate of the population
proportion. In confidence intervals, we assume
samples are drawn from different populations.

38
Using a Pivot Table
39
Two-sample Test for Difference Between
Population Means

If the samples are paired, compute the difference
for each member of the sample, then compute the
mean difference and its standard error, and use
the one-sample test for a population mean
If the samples are independent, then we compute
the probability of a difference in sample means
larger than that observed, under the null
hypothesis that both samples were drawn from the
same population

40
Matched Pairs Change in Test Score

Did the average reading test score of Choice
students change from 1990 to 1991?
H0 D0 0 HA D0 ? 0

Fail to reject H0 change isnt significant.
41
Matched Pairs with Data Analysis

Sort Data (by year and choice)
Select Tools/Data Analysis/
t-Test Paired Two Sample for Means
Enter both data ranges (read and pread)
Enter hypothesized mean difference (e.g., 0)
Enter labels, alpha, and output

42
Matched Pairs with Data Analysis
43
Sort, isolate data Select Analyse/Parametric Paire
d Samples t Test Select variables, HA, CL
44
Two-sample Difference in Test Scores

Are the test scores of Choice students different
from those of low-income MPS students?
H0 (mC mM) 0 HA (mC mM) ? 0

45
Using a Pivot Table
46
Difference in Test Scores (2)
47
Difference in Test Scores (3)
48
Difference in Test Scores (4)

When in doubt, use larger SE (a judgment call)

Reject the null hypothesis the average test
scores of the two groups are significantly
different (i.e., a difference this large is
unlikely to occur by chance, if the two groups
have equal reading abilities.
49
Using Data Analysis

Sort Data (by year, choice, and lowinc)
Select Tools/Data Analysis/
t-Test Two Sample Assuming Unequal
Variances t-Test Two Sample Assuming Equal
Variances
Enter both data ranges (read for choice 0, 1)
Enter hypothesized mean difference (e.g., 0)
Enter labels, alpha, and output

50
Data Analysis (unequal variance)
51
Data Analysis (equal variance)

Write a Comment

User Comments (0)