Title: Two-dimensional Chi-square
1Two-dimensional Chi-square
- Sometimes, we want to classify cases on two
dimensions at the same time for example, we
might want to classify newly-qualified physicians
on the basis of their choice of type of practice
and their sex. - If we did this, we could ask whether there is any
relationship between the two that is, are women
and men equally likely to choose each type?
2Two-dimensional Chi-square
- If we classify a set of cases on two dimensions,
and the two dimensions are independent of each
other, then the proportions of events in the
categories on one dimension should be the same in
all the categories on the other dimension -
- Thus, if choice of type of medical practice is
independent of sex, then the proportions of men
choosing various types of practice should be the
same as the proportions of women
3Two-dimensional Chi-square
- Specialty
- Sex Rural GP City GP Specialist S
- Male 5 20 15 40
- Female 20 80 60 160
- In this data set, there are four times as many
women in the sample as men. There are also four
times as many women in each specialty thus,
choice of specialty appears to be independent of
sex.
4Two-dimensional Chi-square
- Specialty
- Sex Rural GP City GP Specialist S
- Male 16 100 44 160
- Female 4 25 11 40
- In this data set, there are four times as many
men as women but again the proportions are
constant across specialties. Again, choice of
specialty appears to be independent of sex.
5Two-dimensional Chi-square
- Specialty
- Sex Rural GP City GP Specialist S
- Male 20 45 35 100
- Female 5 65 30 100
- In this data set, there are equal numbers of
women and men. But the proportions vary across
specialties thus, choice of specialty appears
to be dependent on sex.
6Two-dimensional Chi-square
- The null hypothesis in the two-dimensional
chi-square test is that the two dimensions are
not related (that is, they are independent). To
test this hypothesis, we need to compute expected
values for each of the cells defined by the two
dimensions. - In there were 25 rural GPs in our sample, and if
type of practice were independent of sex, then
half of the rural GPs should be men and half
women.
7Two-dimensional Chi-square
- Our expected values reflect two proportions the
proportion of the sample in each sex category and
the proportion in each practice category - Specialty
- Sex Rural GP City GP Specialist S
- Male 12.5 55 32.5 100
- Female 12.5 55 32.5 100
- S 25 110 65 200
8Two-dimensional Chi-square
- Well step through the calculations
- Specialty
- Sex Rural GP City GP Specialist S
- Male 12.5 55 32.5 100
- Female 12.5 55 32.5 100
- S 25 110 65 200
9 Specialty Sex Rural GP City
GP Specialist S Male 12.5 55 32.5
100 Female 12.5 55 32.5 100 S
25 110 65 200
First, notice this number the sum of all the
observations
Then note this number the number of males
10 Specialty Sex Rural GP City
GP Specialist S Male 12.5 55 32.5
100 Female 12.5 55 32.5 100 S
25 110 65 200
Then note this number the number of rural GPs
This number is calculated as 100 25
12.5 200
11Two-dimensional Chi-square
- Thus, expected values are computed as
- Expected value (Row total column total)
- sum of observations.
- If you can do that, you can do the 2-dimensional
chi-square.
12Two-dimensional Chi-square
- For the physicians example, we compute
- ?2 20-12.52 5-12.52 45-552 65-552
- 12.5 12.5 55 55
- 35-32.52 30-32.52
- 32.5 32.5
- 13.0209
13Two-dimensional Chi-square
- For the 2-D chi-square, degrees of freedom are
- (r-1)(c-1)
- where r of rows and c of columns. Here, r
2, c 3, so d.f. 1 2 2. - Thus, ?2crit ?2(.05,2) 5.99147. Our decision
is to reject the null hypothesis (that the two
dimensions are independent).
14Formula for compute expected values
- More generally, the rule for working out expected
values in two dimensional classifications is - Ê(nij) ri cj
- n
- where n total number of observations (cases in
the sample)
15Chi-square Example 1 (from last week)
- At a recent meeting of the Coin Flippers Society,
each member flipped three coins simultaneously
and the number of tails occurring was recorded.
1b. Subsequently, the number of tails each member
flipped was determined for different value coins.
The data are shown on the next slide as the
number of members throwing different numbers of
tails with different value coins.
16Chi-square Example 1b
- Coin Number of Tails
- Value 0 1 2 3
- .05 20 55 72 15
- .10 24 70 70 24
- .25 21 57 52 20
- Is there evidence that the number of tails is
affected by coin value? (a .05)
17Chi-square Example 1b
- HO The two classifications are independent
- HA The two classifications are dependent
- Test statistic ?2 nij Ê(nij)2
- Ê(nij)
- Rejection region ?2obt gt ?2crit ?2(.05, 6)
12.5916
S
18Chi-square Example 1b
- The first step is to compute the expected values
for each cell, using the formula - Ê(nij) ri cj
- n
- For the top left cell, we get (65) (162)
21.06 - 500
19Chi-square Example 1b
- Using the formula for all the other cells gives
- 0 1 2 3
- .05 21.06 58.99 62.86 19.12
- .10 24.44 68.43 72.94 22.18
- .25 19.50 54.60 58.20 17.70
- We are now ready to compute ?2 obtained.
20Chi-square Example 1b
- ?2obt 20-21.062 20-17.72
- 21.06 17.7
- 4.032
- Decision do not reject HO - there is no evidence
that the number of tails is affected by coin
value.
21Chi-square Example 2b
- There is an old wives tale that babies dont
tend to be born randomly during the day but tend
more to be born in the middle of the night,
specifically between the hours of 1 AM and 5 AM.
To investigate this, a researcher collects
birth-time data from a large maternity hospital.
The day was broken into 4 parts Morning (5 AM to
1 PM), Mid-day (1 PM to 5 PM), Evening (5 PM to 1
AM), and Mid-night (1 AM to 5 AM).
22Chi-square Example 2b
- The numbers of births at these times for the last
three months (January to March) are shown below - Morning Mid-day Evening Mid-night
- 110 50 100 100
23Chi-square Example 2b
- A question can certainly be raised as to whether
the pattern reported above is peculiar to births
in the winter months or reflects births at other
times of the year as well. - The data obtained from the same hospital during
the hottest summer months last year are shown on
the next slide, along with the original data.
24Chi-square Example 2b
- Morn Midd Even Mid-night S
- 110 50 100 100 360
- 90 40 80 70 280
- S 200 90 180 170 640
- Are the two patterns different? (a .05)
25Chi-square Example 2b
- HO The two classifications are independent
- HA The two classifications are dependent
- Test statistic ?2 nij Ê(nij)2
- Ê(nij)
- Rejection region ?2obt gt ?2crit ?2(.05, 3)
7.81
S
26Chi-square Example 2b
- The first step is to compute the expected values
for each cell, using the formula - Ê(nij) ri cj
- n
- For the top left cell, we get (200) (360)
112.5 - 640
27Chi-square Example 2b
- Using the formula for the other cells we get
- Morn Midd Even Midn
- Cold 112.5 50.625 101.25 95.625
- Hot 87.5 39.375 78.75 74.375
28Chi-square Example 1b
- ?2obt 110-112.52 70-74.3752
- 112.5 74.375
- 0.6374
- Decision do not reject HO - there is no evidence
that the pattern of births is different in the
hot months compared to the rest of the year.