Title: Parametric statistics:
1Chi-Square (C2)
Parametric statistics
- all the inferential statistics covered so far
fall into this category
- requirement for interval or ratio data
Nonparametric statistics
- useful for nominal (or ordinal) data
- based on frequency counts
2Chi-Square (C2)
There are 2 inferential tests that are based on
frequency data
1. one-way C2 (goodness of fit test)
2. two-way C2 (test of independence)
- these 2 tests ask very different questions
(based on the same fundamental idea)
3Chi-Square (C2)
one-way C2 (goodness of fit test)
- in the simplest case, we can use this test to
look at a single nominal variable with 2 levels
- note that the variable is not referred to as an
IV here because we are not assigning subjects to
levels
4Chi-Square (C2)
one-way C2 (goodness of fit test)
- for example, we may be interested in the ratio
of males to females enrolled in various
university programs
- gender is a nominal variable
- the population (humans) is roughly 50 male and
50 female
5Chi-Square (C2)
one-way C2 (goodness of fit test)
- given what we know about the frequency of the 2
genders in the population, what frequency of
gender would we expect in
- an engineering class
- a social work class
- a psychology class
- a music theory class
6Chi-Square (C2)
one-way C2 (goodness of fit test)
- as with previous tests, sampling error is
always a possibility
- i.e., this class may have more females than
males, while some other psychology class has more
males than females (we dont expect every class
to be exactly 50/50)
7Chi-Square (C2)
one-way C2 (goodness of fit test)
- however, if the gender frequencies we observe
are very different from what we expect, that will
suggest that gender is a factor in choosing
programs of study
- i.e., if this class is 90 female and 10 male,
it is unlikely to be the result of sampling
error
8Chi-Square (C2)
one-way C2 (goodness of fit test)
- it may be the case that gender is a factor in
choosing a discipline
- e.g., females outnumber males in psychology,
males outnumber females in chemistry, etc...
because there is something to do with gender that
makes one or the other more appealing
9Chi-Square (C2)
one-way C2 (goodness of fit test)
setting up the hypotheses
- as always, the null hypothesis suggests
nothing happened
- in this case, it is the hypothesis that the
frequencies we observe (o) will be the same as
the frequencies we expect (e)
10Chi-Square (C2)
one-way C2 (goodness of fit test)
setting up the hypotheses
H0 e o
H1 e ? o
- note that although the hypotheses are stated in
terms of and ? this is always a one-tailed
test
11Chi-Square (C2)
one-way C2 (goodness of fit test)
setting up the hypotheses
In determining what the expected frequencies are,
we may have information about (e.g.) frequencies
in the general population
- if we dont have a specific reason to assume
otherwise, we expect an equal number of
observations in both/all categories
12Chi-Square (C2)
one-way C2 (goodness of fit test)
the C2 sampling distribution
in this case the degrees of freedom are the
number of categories (k) minus 1
critical value of C2 depends on df (k-1) and a
(.05)
13Chi-Square (C2)
one-way C2 (goodness of fit test)
The Formula
C2 ?((o - e)2 / e)
The Data 74 students registered in this class
e 37
e 37
o 56
o 18
14Chi-Square (C2)
one-way C2 (goodness of fit test)
The Calculation
C2 ?((o - e)2 / e)
((56 - 37)2 / 37) (18 - 37)2 / 37)
(192 / 37) ((-19)2 / 37)
(361 / 37) (361 / 37)
9.757 9.757
19.51
df k - 1 2 -1 1
C2(1, N74) 19.51
15Chi-Square (C2)
one-way C2 (goodness of fit test)
- from Table 6 in the text we find that for df
1 and a .05, the critical value of C2 is 3.84
- our obtained value of C2 exceeds the critical
value, so we can reject H0
- we can conclude that there are significantly
more females (and/or significantly fewer males)
in this class than we would expect if gender had
no influence on course selection
- i.e., it seems that psychology attracts more
females than males for some reason
16Chi-Square (C2)
one-way C2 (goodness of fit test)
- the same logic can be used to test the goodness
of fit where the variable has gt2 levels
- e.g., when recording the marital status of a
sample of people we might use 3 categories
never married, currently married, or divorced
17Chi-Square (C2)
two-way C2 (test of independence)
- this test is used assess whether or not 2
variables are independent of one another
- e.g., when buying a car, does gender influence
the choice of manual vs automatic transmission or
are the 2 factors independent (gender has nothing
to do with choice of transmission type)?
18Chi-Square (C2)
two-way C2 (test of independence)
setting up the hypotheses
- there are no symbolic versions of these
hypotheses, so state them in words
H0 the 2 variables are independent
H1 the 2 variables are not independent
19Chi-Square (C2)
two-way C2 (test of independence)
step 1 set up a data table
20Chi-Square (C2)
two-way C2 (test of independence)
step 2 calculate the expected frequencies
- for each cell, e (row total x column total)
/ N
o 35
o 45
35 45 80
e ((47)(80))/160
e ((113)(80))/160
3760/160 23.5
9040/160 56.5
o 68
o 12
12 68 80
e ((47)(80))/160
e ((113)(80))/160
3760/160 23.5
9040/160 56.5
N 160
35 12 47
45 68 113
21Chi-Square (C2)
two-way C2 (test of independence)
step 3 use the (same) formula
C2 ?((o - e)2 / e)
((35 - 23.5)2 / 23.5) ((12 -23.5)2 / 23.5)
((45 - 56.5)2 / 56.5) ((68 - 56.5)2 / 56.5)
(11.52 / 23.5)((-11.5)2 / 23.5)((-11.5)2 /
56.5)(11.52 / 56.5)
(132.25 / 23.5)(132.25 / 23.5)(132.25 /
56.5)(132.25 / 56.5)
5.628 5.628 2.341 2.341
15.94
df ( of rows - 1)( of columns -1) (2 - 1)(2
- 1) (1)(1) 1
C2(1, N160) 15.94
22Chi-Square (C2)
two-way C2 (test of independence)
step 4 locate the critical value
C2crit 3.84
C2(1, N160) 15.94
- reject H0
- gender and choice of transmission are not
independent
- i.e., there is a significant relationship
between your gender and the type of transmission
you choose (females more likely to choose
automatic than males)
23next week review (final class)