Title: Chi Square - ?2
1Chapter 14
2Chi Square
- Chi Square is a non-parametric statistic used to
test the null hypothesis. - It is used for nominal data.
- It is equivalent to the F test that we used for
single factor and factorial analysis.
3 Chi Square
- Nominal data puts each participant in a category.
Categories are best when mutually exclusive and
exhaustive. This means that each and every
participant fits in one and only one category - Chi Square looks at frequencies in the
categories.
4Expected frequencies and the null hypothesis ...
- Chi Square compares the expected frequencies in
categories to the observed frequencies in
categories. - Expected frequenciesare the frequencies in each
cell predicted by the null hypothesis
5 Expected frequencies and the null hypothesis ...
- The null hypothesis
- H0 fo fe
- There is no difference between the observed
frequency and the frequency predicted (expected)
by the null. - The experimental hypothesis
- H1 fo ? fe
- The observed frequency differs significantly from
the frequency predicted (expected) by the null.
6Calculating ?2
For each cell
- Calculate the deviations of the observed from the
expected.
- Divide the squared deviations by the expected
value.
7Calculating ?2
- Add em up.
- Then, look up ?2 in Chi Square Table
- df k - 1 (one sample ?2)
- OR df (Columns-1) (Rows-1)
- (2 or more samples)
8Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
9Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
Degrees of freedom
10Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
Critical values ? .05
11Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
Critical values ? .01
12Example
If there were 5 degrees of freedom, how big would
?2 have to be for significance at the .05 level?
13Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
14Using the ?2 table.
If there were 2 degrees of freedom, how big would
?2 have to be for significance at the .05 level?
Note Unlike most other tables you have seen,
the critical values for Chi Square get larger as
df increase. This is because you are summing
over more cells, each of which usually
contributes to the total observed value of chi
square.
15Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
16One sample example Party 75 male, 25
femaleThere are 40 swimmers. Since 75 of
people at party are male, 75 of swimmers should
be male. So expected value for males is .750 X 40
30. For women it is .250 x 40 10.00
Observed 20 20
Expected 30 10
O-E -10 10
(O-E)2 100 100
(O-E)2/E 3.33 10
Male Female
df k-1 2-1 1
17?2 (1, n40) 13.33
Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
Exceeds critical value at ? .01 Reject the null
hypothesis.
Gender does affect who goes swimming.
Women go swimming more than expected.
Men go swimming less than expected.
182 sample example
Freshman and sophomores who like horror movies.
150
50
Likes horror films
200
100
Dislikes horror films
19There are 500 altogether. 200 (or a proportion of
.400 like horror movies, 300 (.600) dislike
horror films. (Proportions appear in parentheses
in the margins.) Multiplying by the proportion in
the likes horror films row by the number in the
Freshman column yield the following expected
frequency for the first cell. The formula is
Expected Frequency (Proprowncol). (EF appears
in parentheses in each cell.)
200 (.400)
(100)
150
50 (100)
Likes horror films
200 (150)
100 (150)
300 (.600)
Dislikes horror films
250
500
250
20Computing ?2
Observed 150 100 50 200
Expected 100 150 100 150
Fresh Likes Fresh Dislikes Soph Likes Soph
Dislikes
df (C-1)(R-1) (2-1)(2-1) 1
21?2 (1, n500) 83.33
Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
Critical at ? .01 Reject the null hypothesis.
Fresh/Soph dimension does affect liking for
horror movies.
Proportionally, more freshman than sophomores
like horror movies
22The only (slightly)hard part is computing
expected frequencies
- In one sample case, multiply n by a hypothetical
proportion based on the null hypothesis that
frequencies will be random.
23Simple Example - 100 teenagers listen to radio
stations
H1 Some stations are more popular with teenagers
than others. H0 Radio station do not differ in
popularity with teenagers. Expected frequencies
are the frequencies predicted by the null
hypothesis. In this case, the problem is simple
because the null predicts an equal proportion of
teenagers will prefer each of the four radio
stations.
Is the observed significantly different from the
expected?
24Observed
Expected
40 30 20 10
25 25 25 25
15 5 -5 15
225 25 25 225
9.00 1.00 1.00 9.00
Closeness to final exam
Category 1 Station 2 Station 3 Station 4
df k-1 (4-1) 3 ?2(3, n100) 20.00, plt.01
25Example - Admissions to Psychiatric Hospitals
Close to a once/year final
H1 More people are admitted to
psychiatric hospitals when it is near their final
exam. H0 Time from final exam does not have
an effect on hospital admissions. .
Category 1 Within 7 days of final. (11
admitted) Category 2 Between 8 and 30 days. (24
admitted) Category 3 Between 31 and 90 days. (69
admitted) Category 4 More than 90 days. (96
admitted)
26Psychiatric Admissions
- Expected frequencyexpected proportion of daysn
- There are 365 days and 1 final and 200 patients
admitted each year. - Proportion of each kind of day computed below
27Expected Frequencies
To obtain expected frequencies with 200
admissions multiply proportion of days of each
type by n200. This time the proportions are not
equal.
28Observed
Expected
11 24 69 96
8 26 66 100
3 -2 3 -4
9 4 9 16
1.12 0.15 0.14 0.16
Closeness to final exam
Category 1 Category 2 Category 3 Category 4
df k-1 (4-1) 3 ?2(3, n200) 1.57, n.s.
29The only (slightly)hard part is computing
expected frequencies
- In the multi-sample case, multiply proportion in
row by numbers in each column to obtain EF in
each cell.
30A 3 x 4 Chi Square
Women, stress, and seating preferences. (and
perimeter vs. interior, front vs. back
Front Front Back
Back Perim Inter
Perim Inter
Very Stressed Females Moderately Stressed
Females Control Group Females
10
70
5
15
100
15
50
10
25
100
35
30
15
20
100
300
60
30
150
60
31Expected frequencies
Women, stress, and perimeter versus interior
seating preferences.
Front Front Back
Back Perim Inter
Perim Inter
Very Stressed Females Moderately Stressed
Females Control Group Females
10
70
5
15
(20)
100
(20)
15
50
10
25
100
(20)
35
30
15
20
100
300
60
30
150
60
32Column 2
Women, stress, and perimeter versus interior
seating preferences.
Front Front Back
Back Perim Inter
Perim Inter
Very Stressed Females Moderately Stressed
Females Control Group Females
10
70
5
15
(20)
100
(50)
(20)
15
50
10
25
100
(50)
(20)
35
30
15
20
(50)
100
300
60
30
150
60
33Column 3
Women, stress, and perimeter versus interior
seating preferences.
Front Front Back
Back Perim Inter
Perim Inter
Very Stressed Females Moderately Stressed
Females Control Group Females
10
70
5
15
(20)
100
(50)
(10)
(20)
15
50
10
25
100
(50)
(10)
(20)
35
30
15
20
(50)
(10)
100
300
60
30
150
60
34All the expected frequencies
Women, stress, and perimeter versus interior
seating preferences.
Front Front Back
Back Perim Inter
Perim Inter
Very Stressed Females Moderately Stressed
Females Control Group Females
10
70
5
15
(20)
100
(50)
(10)
(20)
(20)
15
50
10
25
100
(50)
(10)
(20)
(20)
35
30
15
20
(50)
(10)
(20)
100
300
60
30
150
60
35Observed 10 70 5 15
Expected 20 50 10 20
Very Stressed
FrontP FrontI BackP BackI
15 50 10 25
20 50 10 20
-5 0 0 5
25 0 0 25
1.25 0.00 0.00 1.25
Moderately Stressed
FrontP FrontI BackP BackI
35 30 15 20
20 50 10 20
15 -20 5 0
225 400 25 0
11.25 8.00 2.50 0.00
Control Group
FrontP FrontI BackP BackI
df (C-1)(R-1) (4-1)(3-1) 6
36?2 (6, N300) 41.00
Critical values of ?2
df 1 2 3 4
5 6 7 8 .05
3.84 5.99 5.82 9.49 11.07
12.59 14.07 15.51 .01 6.63 9.21
11.34 13.28 15.09 16.81 18.48
20.09 df 9 10 11
12 13 14 15
16 .05 16.92 18.31 19.68 21.03 22.36
23.68 25.00 26.30 .01 21.67 23.21
24.72 26.22 27.69 29.14 30.58
32.00 df 17 18 19
20 21 22 23
24 .05 27.59 28.87 30.14 31.41 32.67
33.92 35.17 36.42 .01 33.41 34.81
36.19 37.57 38.93 40.29 41.64
42.98 df 25 26 27
28 29 30 .05 37.65 38.89
40.14 41.34 42.56 43.77 .01 44.31
45.64 46.96 48.28 49.59 50.89
There is an effect between stressed women
and seating position.
Critical at ? .01 Reject the null hypothesis.
37Observed 10 70 5 15
Expected 20 50 10 20
O-E -10 20 -5 -5
(O-E)2 100 400 25 25
(O-E)2/E 5.00 8.00 2.50 1.25
Very Stressed
FrontP FrontI BackP BackI
15 50 10 25
20 50 10 20
-5 0 0 5
25 0 0 25
1.25 0.00 0.00 1.25
Moderately Stressed
FrontP FrontI BackP BackI
Very stressed women avoid the perimeter and
prefer the front interior.
The control group prefers the perimeter and
avoids the front interior.
35 30 15 20
20 50 10 20
15 -20 5 0
225 400 25 0
11.25 8.00 2.50 0.00
Control Group
FrontP FrontI BackP BackI
?2 41.00
df (C-1)(R-1) (4-1)(3-1) 6
38Summary Different Ways of Computing the
Frequencies Predicted by the Null Hypothesis
- One sample
- Expect subjects to be distributed equally in each
cell. OR - Expect subjects to be distributed proportionally
in each cell. OR - Expect subjects to be distributed in each cell
based on prior knowledge, such as, previous
research. - Multi-sample
- Expect subjects in different conditions to be
distributed similarly to each other. Find the
proportion in each row and multiply by the number
in each column to do so.
39Conclusion - Chi Square
- Chi Square is a non-parametric statistic,used for
nominal data. - It is equivalent to the F test that we used for
single factor and factorial analysis. - Chi Square compares the expected frequencies in
categories to the observed frequencies in
categories.
40 Conclusion - Chi Square
- The null hypothesis
- H0 fo fe
- There is no difference between the observed
frequency and frequency predicted by the null
hypothesis. - The experimental hypothesis
- H1 fo ? fe
- The observed frequency differs significantly from
the frequency expected by the null hypothesis.
41The end. Hope you found the slides helpful!RK
42Example - Vitamin C and Flu
Experimental Hypothesis Vitamin C prevents
influenza. Null Hypothesis Vitamin C has no
effect on getting the flu. 30 subjects in each
experimental group.
Are the observed significantly different from the
expected?
43How I computed expected frequencies
Multiply the proportion in each row times the
number in each column. Here Vitamin C row has 30
research participants. Total N 60. So
30/60.500 (half). Twenty-five got influenza. So
half of those 25 should come from the Vitamin C
group. (25 X .50012.5). Same for placebo.
Thirty five did not get influenza, so 35X.500
17.5 of each group should not have.
Are the observed significantly different from the
expected?