Categorical Data Analysis - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Categorical Data Analysis

Description:

Comparing multiple means, aren't we? 13 - 14 2003 Pearson Prentice Hall ... 1. Compares Observed Count to Expected Count If Null Hypothesis. Is True ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 83
Provided by: johnj178
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data Analysis


1
Chapter 13
  • Categorical Data Analysis

2
Learning Objectives
  • 1. Explain ?2 Test for Proportions
  • 2. Explain ?2 Test of Independence
  • 3. Solve Hypothesis Testing Problems
  • Two or More Population Proportions
  • Independence

3
Data Types
4
Qualitative Data
  • 1. Qualitative Random Variables Yield Responses
    That Classify
  • Example Gender (Male, Female)
  • 2. Measurement Reflects in Category
  • 3. Examples
  • Do You Own Savings Bonds?
  • Do You Live On-Campus or Off-Campus?

5
Hypothesis Tests Qualitative Data
6
Chi-Square (?2) Test for k Proportions
7
Hypothesis Tests Qualitative Data
8
Chi-Square (?2) Test for k Proportions
  • 1. Tests Hypothesis About Proportions Only
  • Example p1 .2, p2.3, p3 .5
  • 2. One Variable With Several Levels
  • 3. Assumptions
  • Multinomial Experiment
  • Large Sample Size
  • All Expected Counts ? 5
  • 4. Uses One-Way Contingency Table

9
Multinomial Experiment
  • 1. n Identical Trials
  • 2. k Outcomes to Each Trial
  • 3. Constant Outcome Probability, pi
  • 4. Independent Trials
  • 5. Random Variable is Count, ni
  • 6. Example Ask 100 People (n) Which of 3
    Candidates (k) They Will Vote For

10
One-Way Contingency Table
  • 1. Shows Observations in k Independent Groups
    (Outcomes or Variable Levels)

11
One-Way Contingency Table
  • 1. Shows Observations in k Independent Groups
    (Outcomes or Variable Levels)

Outcomes (k 3)
Number of responses
12
Generating in Stata
  • . tab displaymode
  • displaymode Freq. Percent Cum.
  • -----------------------------------------------
  • archiv 1 0.00 0.00
  • flat 5,425 3.14 3.14
  • nested 28,625 16.59 19.73
  • nocomm 366 0.21 19.94
  • thread 138,164 80.06 100.00
  • -----------------------------------------------
  • Total 172,581 100.00

13
Why not ANOVA for this?
  • Comparing multiple means, arent we?

14
Why not ANOVA for this?
  • Comparing multiple means, arent we?
  • Yes, but
  • Outcomes are dependent
  • If higher count for outcome 1, lower for outcome
    2

15
?2 Test for k Proportions Hypotheses Statistic
16
?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
  • 1. Hypotheses
  • H0 p1 p1,0, p2 p2,0, ..., pk pk,0
  • Ha Not all pi equal their hypothesized values

17
?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
  • 1. Hypotheses
  • H0 p1 p1,0, p2 p2,0, ..., pk pk,0
  • Ha Not all pi equal their hypothesized values
  • 2. Test Statistic

Observed count
Expected count
18
?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
  • 1. Hypotheses
  • H0 p1 p1,0, p2 p2,0, ..., pk pk,0
  • Ha Not all pi are equal
  • 2. Test Statistic
  • 3. Degrees of Freedom k - 1

Observed count
Expected count
Number of outcomes
19
?2 Test Basic Idea
  • 1. Compares Observed Count to Expected Count If
    Null Hypothesis Is True
  • 2. Closer Observed Count to Expected Count, the
    More Likely the H0 Is True
  • Measured by Squared Difference Relative to
    Expected Count
  • Reject Large Values

20
Sampling Distribution for ?2 Statistic
  • Run the experiment thousands of times
  • Each time draw a sample of size n and get counts
    for each of the k outcomes
  • Each time compute a single value, the ?2
    statistic
  • ?2 distribution gives the frequency with which
    youd get different values for that statistic
  • The ?2 distribution is different for different
    degrees of freedom depends only on k
  • Actually, ?2 distribution is only an
    approximation of the true distribution for the ?2
    statistic youd get
  • Better approximation as n gets large
  • Then compute p-values or do hypothesis tests
  • Rule of thumb only conduct tests based on this
    sampling distribution when expected count for
    each of the possible outcomes is gt5
  • Confidence intervals dont really make sense
    here, as theres so meaningful point estimate
    that were trying to draw an interval around

21
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
22
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
?2 Table (Portion)
23
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
?2 Table (Portion)
24
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
25
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
26
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
27
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
28
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
29
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
30
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
31
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
32
Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
33
Check Your Understanding
  • Will a higher value for ?2 statistic yield a
    higher or low p-value?
  • Will a higher value of ?2 statistic make you more
    or less likely to reject null hypothesis?
  • If alpha is smaller, will the critical ?2 value
    be smaller or larger?

34
?2 Test for k Proportions Example
  • As personnel director, you want to test the
    perception of fairness of three methods of
    performance evaluation. Of 180 employees, 63
    rated Method 1 as fair. 45 rated Method 2 as
    fair. 72 rated Method 3 as fair. At the .05
    level, is there a difference in perceptions?

35
?2 Test for k Proportions Solution
36
?2 Test for k Proportions Solution
  • H0
  • Ha
  • ?
  • n1 n2 n3
  • Critical Value(s)

Test Statistic Decision Conclusion
37
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ?
  • n1 n2 n3
  • Critical Value(s)

Test Statistic Decision Conclusion
38
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ? .05
  • n1 63 n2 45 n3 72
  • Critical Value(s)

Test Statistic Decision Conclusion
39
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ? .05
  • n1 63 n2 45 n3 72
  • Critical Value(s)

Test Statistic Decision Conclusion
? .05
40
?2 Test for k Proportions Solution
41
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ? .05
  • n1 63 n2 45 n3 72
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 6.3
? .05
42
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ? .05
  • n1 63 n2 45 n3 72
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 6.3
Reject at ? .05
? .05
43
?2 Test for k Proportions Solution
  • H0 p1 p2 p3 1/3
  • Ha At least 1 is different
  • ? .05
  • n1 63 n2 45 n3 72
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 6.3
Reject at ? .05
? .05
There is evidence of a difference in proportions
44
Intuitions Why doesnt critical value depend on
n?
  • What happens to ?2 statistic as n gets bigger?
  • More terms (cells) to add into the sum
  • Every term is positive
  • Distribution of numerator values is same for each
    additional cell
  • but denominator increases, too
  • Its a weighted sum, with total weight 1

45
Intuitions Why does critical value change with k?
  • With more possible outcomes
  • Greater chance that one of outcomes will have an
    unusual count
  • Higher values of ?2 are expected (more likely to
    get larger values)
  • Therefore critical value for ?2 goes up

46
Running tests in stata
  • No built-in stata code for this
  • Type findit csgof, then click through to install
    the csgof package
  • Csgof chi-square goodness of fit

47
Stata output
  • . keep if displaymode!1
  • (1 observation deleted)
  • . csgof displaymode, expperc(25, 25, 25, 25)
  • ----------------------------------------
  • displae expperc expfreq obsfreq
  • ----------------------------------------
  • flat 25 43145 5,425
  • nested 25 43145 28,625
  • nocomm 25 43145 366
  • thread 25 43145 138,164
  • ----------------------------------------
  • chisq(3) is 289541.81, p 0

48
A very sensitive test
  • . csgof displaymode, expperc(3, 16.8, .2, 80)
  • -----------------------------------------
  • displae expperc expfreq obsfreq
  • -----------------------------------------
  • flat 3 5177.4 5,425
  • nested 16.8 28993.44 28,625
  • nocomm .2 345.16 366
  • thread 80 138064 138,164
  • -----------------------------------------
  • chisq(3) is 17.85, p .0005

49
Finally a non-rejection
  • . csgof displaymode, expperc(3.2, 16.6, .2, 80)
  • -----------------------------------------
  • displae expperc expfreq obsfreq
  • -----------------------------------------
  • flat 3.2 5522.56 5,425
  • nested 16.6 28648.28 28,625
  • nocomm .2 345.16 366
  • thread 80 138064 138,164
  • -----------------------------------------
  • chisq(3) is 3.07, p .3805

50
?2 Test of Independence
51
Hypothesis Tests Qualitative Data
52
?2 Test of Independence
  • 1. Shows If a Relationship Exists Between 2
    Qualitative Variables
  • One Sample Is Drawn
  • Does Not Show Causality
  • 2. Assumptions
  • Multinomial Experiment
  • All Expected Counts ? 5
  • 3. Uses Two-Way Contingency Table

53
?2 Test of Independence Contingency Table
  • 1. Shows Observations From 1 Sample Jointly in
    2 Qualitative Variables

54
?2 Test of Independence Contingency Table
  • 1. Shows Observations From 1 Sample Jointly in
    2 Qualitative Variables

Levels of variable 2
Levels of variable 1
55
?2 Test of Independence Hypotheses Statistic
  • 1. Hypotheses
  • H0 Variables Are Independent
  • Ha Variables Are Related (Dependent)

56
?2 Test of Independence Hypotheses Statistic
  • 1. Hypotheses
  • H0 Variables Are Independent
  • Ha Variables Are Related (Dependent)
  • 2. Test Statistic

Observed count
Expected count
57
?2 Test of Independence Hypotheses Statistic
  • 1. Hypotheses
  • H0 Variables Are Independent
  • Ha Variables Are Related (Dependent)
  • 2. Test Statistic
  • Degrees of Freedom (r - 1)(c - 1)

Observed count
Expected count
Rows Columns
58
?2 Test of Independence Expected Counts
  • 1. Statistical Independence Means Joint
    Probability Equals Product of Marginal
    Probabilities
  • 2. Compute Marginal Probabilities Multiply for
    Joint Probability
  • 3. Expected Count Is Sample Size Times Joint
    Probability

59
Expected Count Example
60
Expected Count Example
61
Expected Count Example
112 160
Marginal probability
62
Expected Count Example
112 160
Marginal probability
78 160
Marginal probability
63
Expected Count Example
112 160
Marginal probability
Joint probability
78 160
Marginal probability
64
Expected Count Example
112 160
Marginal probability
Joint probability
78 160
Marginal probability
54.6
65
Expected Count Calculation
66
Expected Count Calculation
67
Expected Count Calculation
11282 160
11278 160
4878 160
4882 160
68
?2 Test of Independence Example
  • Youre a marketing research analyst. You ask a
    random sample of 286 consumers if they purchase
    Diet Pepsi or Diet Coke. At the .05 level, is
    there evidence of a relationship?

69
?2 Test of Independence Solution
70
?2 Test of Independence Solution
  • H0
  • Ha
  • ?
  • df
  • Critical Value(s)

Test Statistic Decision Conclusion
71
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ?
  • df
  • Critical Value(s)

Test Statistic Decision Conclusion
72
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ? .05
  • df (2 - 1)(2 - 1) 1
  • Critical Value(s)

Test Statistic Decision Conclusion
73
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ? .05
  • df (2 - 1)(2 - 1) 1
  • Critical Value(s)

Test Statistic Decision Conclusion
? .05
74
?2 Test of Independence Solution
?
E(nij) ? 5 in all cells
116132 286
154132 286
170132 286
170154 286
75
?2 Test of Independence Solution
76
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ? .05
  • df (2 - 1)(2 - 1) 1
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 54.29
? .05
77
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ? .05
  • df (2 - 1)(2 - 1) 1
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 54.29
Reject at ? .05
? .05
78
?2 Test of Independence Solution
  • H0 No Relationship
  • Ha Relationship
  • ? .05
  • df (2 - 1)(2 - 1) 1
  • Critical Value(s)

Test Statistic Decision Conclusion
?2 54.29
Reject at ? .05
? .05
There is evidence of a relationship
79
Siskel and Ebert (13.49)
  • . tab siskel ebert
  • Ebert
  • Siskel Con Mix Pro
    Total
  • ------------------------------------------------
    ------
  • Con 24 8 13
    45
  • Mix 8 13 11
    32
  • Pro 10 9 64
    83
  • ------------------------------------------------
    ------
  • Total 42 30 88
    160

80
Siskel and Ebert
  • . tab siskel ebert, expected chi2
  • Ebert
  • Siskel Con Mix Pro
    Total
  • ------------------------------------------------
    ------
  • Con 24 8 13
    45
  • 11.8 8.4 24.8
    45.0
  • ------------------------------------------------
    ------
  • Mix 8 13 11
    32
  • 8.4 6.0 17.6
    32.0
  • ------------------------------------------------
    ------
  • Pro 10 9 64
    83
  • 21.8 15.6 45.6
    83.0
  • ------------------------------------------------
    ------
  • Total 42 30 88
    160
  • 42.0 30.0 88.0
    160.0
  • Pearson chi2(4) 45.3569 Pr 0.000

81
Conclusion
  • 1. Explained ?2 Test for Proportions
  • 2. Explained ?2 Test of Independence
  • 3. Solved Hypothesis Testing Problems
  • Two or More Population Proportions
  • Independence

82
End of Chapter
Any blank slides that follow are blank
intentionally.
Write a Comment
User Comments (0)
About PowerShow.com