Chapter 26: Comparing Counts - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Chapter 26: Comparing Counts

Description:

Ho: Births are uniformly distributed over zodiac signs. ( pAries=pTaurus ... The hepatitis C status is not independent of the tattoo status. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 24
Provided by: cohs
Category:

less

Transcript and Presenter's Notes

Title: Chapter 26: Comparing Counts


1
Chapter 26Comparing Counts
2
Goodness-of-Fit Test
  • Involves testing a hypothesis.
  • There is no single parameter to estimate.
  • Considers all categories to give an overall idea
    of whether the observed distribution differs from
    the hypothesized one.
  • All creatures have their determined time for
    giving birth and carrying fetus, only a man is
    born all year long, not in determined time, one
    in the seventh month, the other in the eighth,
    and so on till the beginning of the eleventh
    month.
  • Aristotle

3
Assumptions and Conditions
  • Counted Data Condition
  • Check that the data are counts for the categories
    of a categorical variable.
  • Independence Assumption
  • Check that the individuals counted in the cells
    are sampled independently from some population.
  • If not, check the randomization condition the
    individuals who have been counted should be a
    random sample from some population.
  • Sample Size Assumption
  • Expected cell frequency condition expect to
    observe at least 5 individuals in each cell.

4
A Chi-Square Test for Goodness-of-Fit
  • Compare the observed counts in each cell with the
    expected counts.
  • Look at the differences between the observed and
    expected counts.
  • The test is always one-sided.
  • There is no direction to the rejection of the
    null model we know it just doesnt fit.
  • Chi- Square statistic refers to a family of
    sampling distribution models.
  • Number of degrees of freedom is n 1, where n is
    the number of categories.

5
Whats Your Sign?
  • Check Conditions
  • Counted data condition there are counts of the
    number of executives in categories.
  • Randomization condition this is a convenience
    sample, but no expectation of bias.
  • Expected cell frequency condition the null
    hypothesis expects that of the 256 should
    occur in each sign.
  • Hypothesis
  • Ho Births are uniformly distributed over zodiac
    signs. (pAriespTaurus)
  • HA Births are not uniformly distributed over
    zodiac signs.
  • The sampling distribution of the test statistic
    is ?2 with 12 1 11 degrees of freedom.
  • Use a Chi-Square goodness-of-fit test.

6
Whats Your Sign?
  • The chi-square procedure
  • Find the expected values.
  • Values come from the null hypothesis.
  • Multiply the total number of observations by the
    hypothesized proportion.
  • Compute the residuals, Observed Expected.
  • Square the residuals.
  • Compute the component for each cell,
  • Find the sum of the components.
  • Find the degrees of freedom, the number of cells
    minus 1.
  • Test the hypothesis find the P-value.

7
Whats Your Sign?TI-84 Calculator for
chi-square goodness of fit test
  • Enter counts in L1 and expected percentages in
    L2.
  • Convert expected percentages to expected counts.
  • Calculate chi-square in L3.

8
Whats Your Sign?TI-84 Calculator for
chi-square goodness of fit test
  • Find the sum of L3.
  • Find the P-value
  • The probability of finding a ?2 value at least as
    high as the one calculated from the data.
  • DISTR menu, ?2 cdf

9
Whats Your Sign?
  • P-value
  • Test is one-sided, only consider the right tail.
  • Large ?2 values correspond to small P-values,
    leading to rejection of the null hypothesis.
  • The P-value is the area in the upper tail of the
    ?2 model for 11 degrees of freedom above the
    computed ?2 value.
  • Conclusion
  • The P-value of 0.926 means that an observed
    chi-square value of 5.08 or higher would occur
    about 93 of the time.
  • There is virtually no evidence that the
    distribution of zodiac signs among executives is
    not uniform.

10
Comparing Observed Distributions
  • Chi-square test for homogeneity
  • Assumptions and Conditions
  • Counted data condition
  • Check that the data are counts for the categories
    of a categorical variable.
  • Independence Assumption Randomization condition
  • When we test for homogeneity, we often are not
    interested in some larger population so we dont
    need to check the randomization condition.
  • Sample Size Assumption
  • Expected cell frequency condition expected
    count in each cell must be at least 5
    individuals.

11
Post-Graduation Plans
  • Who High school graduates
  • What Post-graduation activities
  • When 1980, 1990, 2000
  • Why Regular survey for general information

12
Post-Graduation Plans
  • Hypothesis
  • Have the choices made by high school graduates in
    what they do after graduation changed?
  • Ho The post-high school choices made by the
    classes of 1980, 1990, and 2000 have the same
    distribution (homogeneous).
  • HA The post-high school choices made by the
    classes of 1980, 1990, and 2000 do not have the
    same distribution.
  • Check the conditions
  • Counted data condition there are counts of the
    number of students in categories.
  • Randomization condition No inference will be
    drawn to other high schools or other classes, so
    no need to check for a random sample.
  • Expected cell frequency condition The expected
    values are all at least 5 (see table, later).
  • Under these conditions, the sampling distribution
    of the test statistic is ?2 with (4 1) X (3
    1) 6 degrees of freedom.
  • Perform a chi-square test of homogeneity.

13
Post-Graduation Plans
  • TI-84 Steps
  • Enter data in a matrix.
  • Do the chi-square test of homogeneity.
  • Matrix Edit B
  • Note that all expected counts are at least 5.

14
Post-Graduation Plans
  • Conclusion
  • The P-value is very small.
  • Observed pattern is very unlikely to occur by
    chance.
  • Reject the null hypothesis.
  • The choices made by high school graduates have
    changed over the two decades examined.

15
Post-Graduation Plans
  • Examine the Residuals
  • Standardized Residuals
  • Divide the cells residual by the square root of
    its expected value.
  • Values are the square root of the components
    calculated for each cell, with or to show
    whether we observed more or less cases than
    expected.
  • What trends do you see?

16
Independence
  • Chi-Square Test for Independence
  • Data categorize subjects from a single group on
    two categorical variables.
  • Contingency Tables
  • Categorize counts on two or more variables.
  • Decide whether the distribution of counts on one
    variable is contingent on the other.
  • Assumptions and Conditions
  • Counted data condition
  • Check that the data are counts for the categories
    of a categorical variable.
  • Independence Assumption Randomization condition
  • When we test for independence, we are interested
    in generalizing to some larger population.
  • Sample Size Assumption
  • Expected cell frequency condition expected
    count in each cell must be at least 5
    individuals.

17
Hepatitis C Related to Tattoos?
  • Who Patients being treated for non-blood-related
    disorders
  • What Tattoo status and hepatitis C status
  • When 1991, 1992
  • Where Texas

18
Hepatitis C Related to Tattoos?
  • Hypothesis
  • Are the categorical variables tattoo status and
    hepatitis C status statistically independent?
  • H0 Tattoo status and hepatitis C status are
    independent.
  • HA Tattoo status and hepatitis C status are not
    independent.
  • Check the conditions
  • Counted data condition there are counts of
    individuals in categories of two categorical
    variables.
  • Randomization condition Although not an SRS, the
    data were selected to avoid biases and should be
    representative of the general population.
  • Expected cell frequency condition The expected
    values do not meet the condition that all are
    greater than 5. Continue with caution be sure
    to check the residuals.
  • Under these conditions, the sampling distribution
    of the test statistic is ?2 with (3 1) X (2
    1) 2 df.
  • Perform a chi-square test for independence.

19
Hepatitis C Related to Tattoos?
  • TI-84 Steps
  • Enter data in a matrix.
  • Do the chi-square test of independence.
  • Matrix Edit B
  • Note that not all expected counts are at least 5.

20
Hepatitis C Related to Tattoos?
  • Conclusion
  • The P-Value is very small, indicating that if
    these variables were independent, the pattern
    seen would be very unlikely to occur by chance.
  • The hepatitis C status is not independent of the
    tattoo status.
  • HOWEVER, check the two cells with the small
    expected counts to determine if they did or did
    not influence the result too greatly.
  • Remember A complete solution must include
    additional analysis, recalculation, and a final
    conclusion.

21
Hepatitis C Related to Tattoos?
  • Analysis of Residuals
  • Too small an expected frequency can arbitrarily
    inflate the residual, leading to an inflated
    chi-square statistic.
  • In this case, the standardized residual for the
    hepatitis C and Tattoo, Parlor cell is large ?
    Inflated chi-square statistic?
  • Standardized Residuals

22
Hepatitis C Related to Tattoos?
  • Options
  • Based upon concerns, choose not to report the
    results.
  • Include a warning when reporting the results.
  • Combine the appropriate categories to larger
    sample size and expected frequencies.
  • Recalculation
  • Recalculation (continued)
  • Conclusion
  • The tattoo status and hepatitis C status are not
    independent. The data suggest that tattoo parlors
    may be a particular problem, but we do not have
    enough data to draw that conclusion.

23
What Can Go Wrong?
  • A failure of independence between two categorical
    variables does not show a cause-and-effect
    relationship between them.
  • There is no way to differentiate the direction of
    any possible causation from one variable to
    another.
  • Lurking variables could be responsible for the
    observed lack of independence.
  • Dont use chi-square methods unless the data are
    counts.
  • Data reported as proportions or percentages can
    be used if they are converted to counts.
  • Just because data are reported in a two-way table
    does not mean they are suitable for chi-square
    procedures.
  • Beware large samples.
  • The degrees of freedom for the chi-square tests
    do not grow with sample size.
  • With a sufficiently large sample size, a
    chi-square test can always reject the null
    hypothesis.
  • There are no confidence intervals to help in
    determining the effect size.
Write a Comment
User Comments (0)
About PowerShow.com