Contingency Tables - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Contingency Tables

Description:

Title: What is Statistics? Author: John Lawrence Last modified by: John Lawrence Created Date: 8/23/1998 12:37:36 PM Document presentation format – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 22
Provided by: JohnL311
Category:

less

Transcript and Presenter's Notes

Title: Contingency Tables


1
  • Contingency Tables
  • For
  • Tests of Independence

2
Multinomials Over Various Categories
  • Thus far the situation where there are multiple
    outcomes for the qualitative variable without
    regard to anything else has been discussed.
  • Now we discuss whether or not two qualitative
    variables are related, i.e are they independent?

3
EXAMPLES
  • (1) Can it be concluded that cola preference and
    gender are dependent?
  • (2) Can it be concluded that cola preference and
    age are dependent?

4
RULE OF 5
  • ?2 (Chi-squared) is actually only an approximate
    distribution for the test statistic.
  • To be a valid approximation
  • ALL eis should be ? 5
  • If the rule of 5 is violated, combine some
    categories so that the condition is met.

5
COLA PREFERENCE VS. GENDER
  • The 1000 cola drinkers were further classified as
    to whether they were male or female.
  • COLA MALE FEMALE ROW TOTAL
  • Coke 240 170 r1
    410
  • Pepsi 200 150 r2 350
  • RC 50 30 r3 80
  • Shasta 35 15 r4
    50
  • Jolt 75 35 r5 110
  • COLUMN
  • TOTAL c1 600 c2 400 n
    1000

6
HYPOTHESIS TESTCan we Conclude Cola Preference
and Gender Are Dependent?
  • H0 (NO) Cola preference and gender are
    independent
  • HA (YES) Cola preference and gender are
    dependent
  • ? .05
  • Reject H0 if ?2 gt ?2.05,DF
  • The correct DF (r-1)(c-1) (5-1)(2-1) (4)(1)
    4
  • where r rows and c columns
  • Reject H0 if ?2 gt ?2.05,4 9.48773

7
HOW DO WE GET THE eijs?
  • Let P(A) Probability a respondent favors Coke
  • Let P(B) Probability a respondent is a male
  • If H0 is true The classifications are
    independent
  • Thus P(A and B) P(A)P(B)
  • Best guess for P(A) ? 410/1000 .41
  • Best guess for P(B) ? 600/1000 .6
  • Thus P(A and B) ? (.41)(.6) .246
  • Expected number (Coke and male) e11
    1000(.246) 246
  • This can be gotten by r1c1/n (410)(600)/1000
    246

8
CONTIGENCY TABLES
  • Contingency tables are a convenient way of
    expressing the results when there are two
    classifications
  • It is the equivalent of a multinomial table for
    two classifications
  • We put the eijs in parentheses under (or next
    to) the fijs in the table then we calculate

9
eijs for Cola vs. Gender
  • Coke/Male e11 (410)(600)/1000 246
  • Coke/Female e12 (410)(400)/1000 164
  • Pepsi/Male e21 (350)(600)/1000 210
  • Pepsi/Female e22 (350)(400)/1000 140
  • RC/Male e31 ( 80)(600)/1000 48
  • RC/Female e32 ( 80)(400)/1000 32
  • Shasta/Male e41 ( 50)(600)/1000 30
  • Shasta/Female e42 ( 50)(400)/1000 20
  • Jolt/Male e51 (110)(600)/1000 66
  • Jolt/Female e52 (110)(400)/1000 44

10
Notes on Calculating es
  • The column totals may be set in advance or may be
    random based on the survey.
  • These eijs were all whole numbers -- if they are
    not DO NOT ROUND TO WHOLE NUMBERS.
  • All these es ? 5 but suppose e52 were actually
    3
  • We might combine the results from Shasta and Jolt
    colas.
  • This would reduce the number of rows and hence
    the degrees of freedom.
  • e52 is not less than 5 here, so we do not have to
    do this.

11
CONTINGENCY TABLE FORCOLA vs. GENDER
  • Men Women Total
  • Coke 240 170 410
  • (246) (164)
  • Pepsi 200 150 350
  • (210) (140)
  • RC 50 30 80
  • ( 48) ( 32)
  • Shasta 35 15 50
  • ( 30) ( 20)
  • Jolt 75 35 110
  • ( 66) ( 44)
  • Total 600 400 1000

12
?2 for Cola vs. Gender
  • ?2 (240-246)2/246 (170-164)2/164
    (200-210)2/210 (150-140)2/140
    ( 50 - 48)2/ 48 ( 30- 32)2/ 32 (
    35 - 30)2/ 30 ( 15- 20)2/ 20 (
    75- 66)2/ 66 ( 35- 44)2/ 44
    6.92
  • ?2 6.92 lt ?2.05,4 9.48773
  • There is not enough evidence to conclude gender
    and cola preference are dependent.

13
COLA PREFERENCE vs. AGE
  • Survey results
  • lt20 20-40 40-60 gt60 TOTAL
  • Coke 155 140 75 40 410
  • Pepsi 155 95 75 25 350
  • RC 30 20 15 15 80
  • Shasta 20 15 10 5 50
  • Jolt 40 30 25 15 110
  • TOTAL 400 300 200 100 1000

14
HYPOTHESIS TEST
  • There are r 5 rows and c 4 columns
  • H0 (NO) Cola preference and age are independent
  • H1 (YES) Cola preference and age are dependent
  • ? .05
  • Reject H0 if ?2 gt ?2.05,DF
  • DF (r-1)(c-1) (5-1)(4-1) (4)(3) 12
  • Reject H0 if ?2 gt ?2.05,12 21.0261

15
Sample eijs
  • e34 (Row 3 Total)(Column 4 Total)/(Grand Total)
  • (80) (100) / 1000 8
  • e41 (Row 4 Total)(Column 1 Total)/(Grand Total)
  • (50) (400) / 1000
    20

16
CONTINGENCY TABLE FORCOLA vs. AGE
  • lt20 20-40 40-60 gt60 Total
  • Coke 155 140 75 40 410
  • (164) (123) (82) (41)
  • Pepsi 155 95 75 25 350
  • (140) (105) (70) (35)
  • RC 30 20 15 15 80
  • ( 32) ( 24) (16) ( 8)
  • Shasta 20 15 10 5 50
  • ( 20) ( 15) (10) ( 5)
  • Jolt 40 30 25 15 110
  • ( 44) ( 33) (22) (11)
  • Total 400 300 200 100 1000

17
?2 for Cola vs. Age
  • ?2 (155-164)2/164 (140-123)2/123
    (75-82)2/82 (40-41)2/41 (
    40 - 44)2/ 44 ( 30- 33)2/ 33 (
    25- 22)2/ 22 ( 15- 11)2/ 11
    18.72
  • ?2 18.72 lt ?2.05,12 21.0261
  • There is not enough evidence to conclude cola
    preference and age are dependent.

18
Excel
  • CHITEST gives the p-value for the test
  • CHITEST(Observed Values, Expected Values)
  • Must first calculate the expected values, eijs
  • See next slide for easy way to calculate these
    values.

19
(No Transcript)
20
(No Transcript)
21
Review
  • Contingency tables allow for comparisons to
    determine if two different categories are
    independent
  • Excel -- CHITEST is used to generate the p-values
    for the chi-squared test
  • Expected Values
  • (Row Total)(Column Total)/n
  • By hand -- total degrees of freedom (r-1)(c-1)
  • and the ?2 statistic is calculated by
Write a Comment
User Comments (0)
About PowerShow.com