Chi-Square Tests

About This Presentation

Title:

Chi-Square Tests

Description:

Week 10 Objectives On completion of this module you should be able to: perform and interpret a 2 test for the difference between two or more proportions perform and ... – PowerPoint PPT presentation

Number of Views:129

Avg rating:3.0/5.0

Slides: 51

Provided by: Lind51

Category:

more less

Transcript and Presenter's Notes

Title: Chi-Square Tests

1
Chi-Square Tests

Week 10

2
Objectives

On completion of this module you should be able
to
perform and interpret a ?2 test for the
difference between two or more proportions
perform and interpret a ?2 test of independence
and
perform and interpret a ?2 goodness of fit test.

3
?2 test for the difference between two proportions

This week we will compare two (or more)
proportions using a frequency of success
approach.
Note that we do not cover the Z test for
differences between two proportions in this
course (feel free to look at Section 9.3 of the
text if you are interested).
We will make use of a contingency table (a
cross-classification table).
This is best explained by an example.

4
Example 10-1

A sample of 400 ex-students was taken and the
students were asked Did you enjoy your
university experience?
The results are given in the table below.
Is there evidence of a significant difference
between the proportions of males and females who
enjoyed their university experience?
(Use the 0.05 level of significance).

5
Example 10-1
Gender Gender
Enjoyed university experience? Male Female Total
Yes 214 107 321
No 66 13 79
Total 280 120 400
6
Example 10-1

We have used similar tables in our module on
probability (Week 4). This is a 2 ? 2 table.
The two variables (each with two outcomes) are
gender (male or female) and enjoyed university
experience (yes or no).
We will conduct a hypothesis test (using a
similar procedure as last week) to test whether
there is a significant difference between the
proportion of males and females who enjoyed their
university experience.

7
Solution 10-1

Firstly we establish our hypotheses
H0 p1 p2
H1 p1 ? p2
We are testing that there is no difference
between the two populations using sample data to
make the conclusion.

8
Finding the rejection region

The test statistic, which we will calculate
shortly, approximately follows a chi-square
distribution with one degree of freedom.
We will reject the null hypothesis, if the test
statistic is greater than the
upper-tail critical value from the chi-square
distribution with one degree of freedom.
Chi-square tests are only ever one-tailed since
we are only interested in (large) differences
between proportions.

9
Finding the rejection region

The rejection rule is
Reject H0 if otherwise do not
reject H0.
The value of is found using Table E.4 in the
text based on the number of degrees of freedom
and the confidence level.

10
Solution 10-1

We are told ?0.05 and know that for a 2 ? 2
contingency table there is one degree of freedom.
Using Table E.4 we find the critical value is
The rejection rule is
Reject H0 if ?2 gt 3.841 otherwise do not reject
H0.

11
The test statistic

The test statistic is given by
where fo is the observed frequency (taken from
the table) and fe is the expected (theoretical)
frequency if the null hypothesis is true.

12
Computing the expected frequencies

The average proportion (of success) is given by

Column variable Column variable
Row variable 1 2 Total
Successes X1 X2 X
Failures n1 - X1 n2 X2 n - X
Total n1 n2 n

Note if the column variable corresponds to
success/failure, the average proportion will be
calculated using values from the success column
rather than row as demonstrated here.

13
Example 10-1
Gender Gender
Enjoyed university experience? Male Female Total
Yes 214 107 321
No 66 13 79
Total 280 120 400
14
Computing the expected frequencies

To obtain the expected frequencies for the
success cells the sample size (column total)
for each group is multiplied by the average
proportion.

fo fe
214 0.8025 ? 280 224.7
107 0.8025 ? 120 96.3
66
13
15
Computing the expected frequencies

To obtain the expected frequencies for the
failure cells the sample size (column total)
for each group is multiplied by one minus the
average proportion.

fo fe
214 224.7
107 96.3
66 (1 - 0.8025) ? 280 55.3
13 (1 - 0.8025) ? 120 23.7
16
Solution 10-1

Now we can calculate the test statistic (8.5995)
via a table as follows

fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
214 224.7 214 - 224.7 -10.7 114.49 114.49 / 224.7 0.5095
107 96.3 107 96.3 10.7 114.49 114.49 / 96.3 1.1889
66 55.3 10.7 114.49 2.0703
13 23.7 -10.7 114.49 4.8308
8.5995
17
Solution 10-1

We compare our test statistic to the critical
region.
Since 8.5995 gt 3.841, we reject the null
hypothesis.
We conclude that there is sufficient evidence to
believe that there is a difference between the
proportion of males and females who enjoyed their
university experience.

18
Assumptions

Whenever we use a ?2 test we assume that each
expected frequency is at least 5.
With larger tables than the 2 ? 2, (for example
comparing more than two proportions) some
statisticians require that expected cells be at
least 1.
Often cells are combined to meet these
requirements.

19
?2 test for the difference in more than two
proportions

Hypotheses
H0 p1 p2 pc
H1 Not all pj are equal
Be careful with setting up H1. If only one
proportion is different from the others we want
to reject the null hypothesis.
Average proportion

20
?2 test for the difference in more than two
proportions

Degrees of freedom
where r is the number of rows and c is the
number of columns in the contingency table.
For the 2 2 table we have
degree of freedom (as we used earlier).
For a 2 c contingency table we have
degree of freedom.

21
Example 10-2

A city is supported by three major IT companies.
There have been rumours of price-fixing and
collusion between the companies.
In order to investigate these accusations, a
consumer watchdog organisation conducted a survey
of 500 consumers of these IT services.
The results of this survey are summarised in the
table below.

22
Example 10-2

At the ? 0.05 level of significance, determine
whether there is evidence of a significant
difference between the consumer satisfaction of
the three IT companies.

IT Company IT Company IT Company
Satisfied with service BMI Unitses Pear Total
Yes 115 99 108 322
No 55 102 21 178
Total 170 201 129 500
23
Solution 10-2

The hypotheses are
H0 p1 p2 p3
H1 Not all pj are equal
For a 2 3 contingency table there are
degrees of freedom.
Given ? 0.05, the critical value is

24
Solution 10-2

The decision rule is
Reject H0 if ?2 gt 5.991, otherwise do not
reject H0.
The average proportion (of successes) is
We will use a table to calculate the test
statistic.

25
Solution 10-2
fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
115 5.52 30.4704 0.2783
99 -30.444 926.837136 7.1601
108 24.924 621.205776 7.4776
55 -5.52 30.4704 0.5035
102 30.444 926.837136 12.9526
21 -24.924 621.205776 13.5268
41.8989
26
Solution 10-2

Note that all expected frequencies were greater
than five, so the chi-square test is appropriate.
Since 41.8989 gt 5.991, we reject the null
hypothesis.
We can conclude that there is sufficient evidence
to believe that there is a difference in the
proportion of satisfied clients for the three IT
companies.

27
?2 test of independence

Hypotheses
H0 the two categorical variables are
independent (i.e. there is no relationship
between them).
H1 the two categorical variables are dependent
(i.e. there is a relationship between them).
Computing the expected frequencies

28
Example 10-3

A group of researchers is interested in
determining whether students who enrol in a
university degree straight from school perform
better than those who take a year off before
beginning university (sometimes call a gap
year).
The following information was gathered from a
sample of 400 students.

29
Example 10-2
Enrolment group Enrolment group
Lowest grade received in first year of study School leaver Gap year Total
HD 27 13 40
D 42 18 60
C 85 35 120
P 121 19 140
F 25 15 40
Total 300 100 400
30
Example 10-3

At the 0.01 level of significance, determine
whether there is evidence of a significant
relationship between the lowest grade a student
receives in their first year of study and whether
they have come directly from school or had a gap
year.
Interpret your result.

31
Solution 10-3

The hypotheses are
H0 there is no relationship between lowest
grade received in first year of study and
enrolment group.
H1 there is a relationship between lowest
grade received in first year of study and
enrolment group.
There are (r - 1)(c - 1) (5 - 1)(2 - 1) 4
degrees of freedom.

32
Solution 10-3

The critical value at a significance level of
0.01 is
The rejection rule is
Reject H0 if ?2 gt 13.277, otherwise do not
reject H0.
Calculate the test statistic via a table as
follows (checking that all expected frequencies
are at least 5).

33
Solution 10-3
fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
27 -3 9 0.3
13 3 9 0.9
42 -3 9 0.2
18 3 9 0.6
? ? ? ? ?
16.1968
34
Solution 10-3

Since 16.1968 gt 13.277 we reject the null
hypothesis.
We conclude that there is sufficient evidence to
indicate that there is a relationship between
lowest grade received in first year of study and
enrolment group.
To interpret a result such of this, it can be
helpful to view the observed and expected cell
values together.

35
Lowest grade received in first year of study Lowest grade received in first year of study Enrolment group Enrolment group
Lowest grade received in first year of study Lowest grade received in first year of study School leaver Gap year Total
HD Obs 27 13 40
(Exp) (30) (10)
D Obs 42 18 60
(Exp) (45) (15)
C Obs 85 35 120
(Exp) (90) (30)
P Obs 121 19 140
(Exp) (105) (35)
F Obs 25 15 40
(Exp) (30) (10)
Total Total 300 100 400
36
Solution 10-3

We now look over this table for any unusual
differences between observed and expected values.
Some comments
School leavers seem to have more P grades than
expected and those who enrol after a gap year
have less than expected.
School leavers have slightly lower than expected
numbers of HD, D, C and F grades whilst gap year
students appear slightly more than expected in
these same grades.

37
Solution 10-3

This is a mixed result and so it is difficult to
determine whether one group is doing better than
the other.
Perhaps lowest grade in the first year is not a
good measure of a students overall performance
and so the experiment is flawed.
What would be a better way to measure
performance? GPA? A students sense of
satisfaction with their performance? Or what?

38
?2 goodness of fit tests

Often we want to know whether some data that we
have can be described by a certain distribution.
We can do this in part by testing the assumptions
of the distribution (eg normal is symmetrical,
bell-shaped, reveals a straight line in a normal
probability plot etc).
We can test how close observed frequencies are to
frequencies that would be theoretically expected
(given that the data followed a particular
distribution) using a chi-square test.

39
?2 goodness of fit tests

The following steps are required
decide what distribution is believed to be
appropriate for the data set.
estimate parameters of the distribution (the
mean, the proportion etc).
determine the expected values for each category
if the data followed the distribution.
use the chi-square test to determine whether
there is a significant difference between actual
and expected values.

40
?2 goodness of fit tests

We will demonstrate the method for determining
goodness of fit for a Poisson distribution here.
Section 11.7 from the text (on CD) demonstrates
how to test goodness of fit for the normal
distribution also, so ensure you are familiar and
confident with this also.

41
Example 10-4

The number of people arriving at a particular
Automatic Teller Machine (ATM) per minute is
recorded during business hours for a 35 hour week
(35 hours 60 minutes 2100 minutes).
The following table summarises the results.
Does the distribution of people arriving at the
ATM follow a Poisson distribution?
Test at the 0.05 level of significance.

42
Example 10-4
Number of people arriving at ATM Frequency
0 120
1 180
2 270
3 390
4 310
5 330
6 250
7 150
8 80
9 15
10 5
Total 2100
43
Solution 10-4

The hypotheses are
H0 the number of people arriving at the ATM
per minute follows a Poisson distribution.
H1 the number of people arriving at the ATM
per minute does not follow a Poisson
distribution.
The (one) parameter of the Poisson distribution
is the mean (which we must estimate).

44
Solution 10-4

This is frequency data so we must use that
formula for the mean
Since the tables in the text give the Poisson
distribution parameter, ? (the mean), to one
decimal place, we will use ? 3.9.

45
Number of people arriving at ATM Actual frequency fo P(X) for Poisson distribution with ? 3.9 Theoretical frequency fe nP(X)
0 120 0.0202 2100 0.0202 42.42
1 180 0.0789 165.69
2 270 0.1539 323.19
3 390 0.2001 420.21
4 310 0.1951 409.71
5 330 0.1522 319.62
6 250 0.0989 207.69
7 150 0.0551 115.71
8 80 0.0269 56.49
9 15 0.0116 24.36
10 5 0.0045 9.45
11 or more 0 0.0023 4.83
46
Solution 10-4

Note that not all expected frequencies are
greater than 5, but the smallest (4.83) is close
and much greater than 1.
The degrees of freedom is given by k - p 1
where k is the number of categories and p is the
number of parameters that were estimated.
Here k 12 and p 1 and so there are k - p
1 12 1 1 10 degrees of freedom.

47
Solution 10-4

At the 0.05 level of significance, the critical
value is
The rejection rule is
Reject H0 if ?2 gt 18.307, otherwise do not
reject H0.

48
Number of people arriving at ATM
0 120 42.42 77.58 6018.6564 141.88252
1 180 165.69 14.31 204.7761 1.23590
2 270 323.19 -53.19 2829.1761 8.75391
3 390 420.21 -30.21 912.6441 2.17188
4 310 409.71 -99.71 9942.0841 24.26615
5 330 319.62 10.38 107.7444 0.33710
6 250 207.69 42.31 1790.1361 8.61927
7 150 115.71 34.29 1175.8041 10.16165
8 80 56.49 23.51 552.7201 9.78439
9 15 24.36 -9.36 87.6096 3.59645
10 5 9.45 -4.45 19.8025 2.09550
11 or more 0 4.83 -4.83 23.3289 4.83000
217.73472
49
Solution 10-4

Since 217.73472 gt 18.307, we reject H0.
We conclude that the number of people arriving
per minute at this particular ATM does not follow
a Poisson distribution.

50
After the lecture each week

Review the lecture material
Complete all readings
Complete all of recommended problems (listed in
SG) from the textbook
Complete at least some of additional problems
Consider (briefly) the discussion points prior to
tutorials

Write a Comment

User Comments (0)

About PowerShow.com

Chi-Square Tests - PowerPoint PPT Presentation

Chi-Square Tests

Week 10 Objectives On completion of this module you should be able to: perform and interpret a 2 test for the difference between two or more proportions perform and ... – PowerPoint PPT presentation