Title: Testing Hypothesis with Categorical Data
1Testing Hypothesis with Categorical Data
2Introduction
- Categorical variables are measured at either the
nominal or the ordinal level, and the values of
these variables consist of distinct categories - Chi-square goodness of fit tests (one variable
tests)consistent with null - Two variable tests (test of independence)is
there a difference
3One-Variable Goodness of Fit Chi-Square Test
- fo the observed frequencies from our sample data
- fe the expected frequencies we should get under
the null hypothesis, and - K the number of categories for the variable
4One-Variable Goodness of Fit Chi-Square Test
- Subtract the expected frequencies from the
observed frequencies, square this difference, and
then divide by the expected frequenciesThis sum
is our obtained value of the chi-square statistic - Degrees of freedom is important to k Table
- E-4
5Two-Variable Chi-Square Test of Independence
- Independent variable (cause)
- Dependent variable (effect)
- Are IV and DV related?
- If so how strong is that relationship?
6Two-Variable Chi-Square Test of Independence
- Contingency Table Shows the joint distribution
of two categorical variables. A contingency
table is defined by the number of rows and number
of columns it has. A contingency table with 3
rows and 2 columns is a 3 x 2 contingency table
7Two-Variable Chi-Square Test of Independence
- There are the (R1 and R2) and column marginals
(C1 and C2) - The row marginals correspond to the number of
cases in each row of the table - The column marginals correspond to the frequency
in each column of the table
8Two-Variable Chi-Square Test of Independence
- f o the observed cell frequencies from our
sample data, - fe the expected cell frequencies we should get
under the null hypothesis, and - k the number of cells in the table
9Two-Variable Chi-Square Test of Independence
- The observed frequencies are the joint
distribution of two categorical variables that we
actually observed in our sample data - The expected frequencies are the joint frequency
distribution we would expect to se if the two
categorical variables were in fact independent of
each other
10Two-Variable Chi-Square Test of Independence
- Multiplication rule
- Expected freq. multiply probability by the total
number of cases - P(A and B) P(A) X P(B)
11Two-Variable Chi-Square Test of Independence
- Where
- RMi the row marginal frequency for row i,
- CMj the column marginal frequency for column j,
and - n the total number of cases
- Pg. 330
12Two-Variable Chi-Square Test of Independence
- Specifically, the chi-square test takes the
difference between the observed and expected cell
frequencies for each cell in the table. If the
observed frequencies are equal to the expected
frequencies (i.e., if the difference between them
is zero), then we can be confident in concluding
that the two variables are independent
13Two-Variable Chi-Square Test of Independence
- If the difference between the observed and
expected cell frequencies is zero, therefore, the
chi-square test also will be zero - As the difference between the observed and
expected cell frequencies increases, the
magnitude of the chi-square test increases and
our assumption of independence becomes more and
more suspicious
14Two-Variable Chi-Square Test of Independence
- What we have to determine, therefore, is how
large a difference we must find between the
observed and expected cell frequencies, or how
large a chi-square must we see, before we are
willing to abandon the null hypothesis of
independence
15Two-Variable Chi-Square Test of Independence
- Chi-Square Test of Independence Table 9-15 and
9-16.
16Measures of Association
- Nominal-Level Variables
- Phi-Coefficient (F) is appropriate when we have a
2 X 2 table - Magnitude of phi near zero indicate a very weak
relationship, while those nearing 1.0 indicate a
very strong relationship
17Nominal-Level Variables
- 0 and .29 (weak)
- .30 and .59 (moderate)
- .60 and 1.00 (strong)
18Measures of Association
- Contingency coefficient (C )
- Based on value of k
- C Square root of X2 /n X2
19Measures of Association
- Cramers V
- Based on k
- V Square root of X2 / n(k-1)
20Measures of Association
- Lamda ( ? )
- Proportionate Reduction in Error (PRE)
- Vary between 0 and 1.0
- A value 0 means we cannot reduce our errors in
predicting the dependent variable from knowledge
of the independent variable, while a value of 1.0
means that we can reduce all of errorsor that
knowledge of the independent will allow us to
predict with perfect accuracy the value of the
dependent variable
21Measures of Association
- ? Number of errors using mode of DV- number of
errors using mode of DV within categories of the
IV / Number of errors using mode of DV
22Measures of Association
- fi the largest cell frequency in each category
of the IV, -
- d the largest marginal frequency of the DV, and
- n the total number of cases
23Measures of Association
- Ordinal Level
- Goodman and Kruskals gamma
- Gamma is a proportionate reduction error measure
with 0 and 1.0 - 0 and .29 (weak)
- .30 and .59 (moderate)
- .60 and 1.00 (strong)
24Measures of Association
- Yules Q
- Q (f cell a X f cell d)- (f cell b X f cell c)/
f cell a X f cell d) (f cell b X f cell c)
25Measures of Association
- Gamma CP-DP/CP DP
- CP the number of concordant pairs of
observation, and - DP the number of discordant pairs of observation
- Magnitude 0 to 1.0