Cross Tabs and ChiSquared - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Cross Tabs and ChiSquared

Description:

One may use cross tabs for ordinal variables, but it is generally better to use ... Cross tabs and Chi-Squared will tell you whether classification on one nominal ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 41
Provided by: JamesD171
Category:
Tags: chisquared | change | cross | female | male | tabs | to

less

Transcript and Presenter's Notes

Title: Cross Tabs and ChiSquared


1
Cross Tabs and Chi-Squared
  • Testing for a Relationship Between Nominal (or
    Ordinal) Variables

2
Cross Tabs and Chi-Squared
  • The test you choose depends on level of
    measurement
  • Independent Dependent Test
  • Dichotomous Interval-ratio Independent Samples
    t-test
  • Dichotomous
  • Nominal Interval-ratio ANOVA
  • Dichotomous Dichotomous
  • Nominal (Ordinal) Nominal (Ordinal) Cross Tabs
  • Dichotomous Dichotomous

3
Cross Tabs and Chi-Squared
  • We are asking whether there is a relationship
    between two nominal (or ordinal) variablesthis
    includes dichotomous variables.
  • One may use cross tabs for ordinal variables, but
    it is generally better to use more powerful
    statistical techniques if you can treat them as
    interval-ratio variables.

4
Cross Tabs and Chi-Squared
  • Cross tabs and Chi-Squared will tell you whether
    classification on one nominal variable is related
    to classification on a second nominal variable.
  • For Example
  • Are rural Americans more likely to vote
    Republican in presidential races than urban
    Americans?
  • Classification of Region Party Vote
  • Are white people more likely to drive SUVs than
    blacks or Latinos?
  • Classification on Race Type of Vehicle

5
Cross Tabs and Chi-Squared
  • The statistical focus will be on the number or
    count of people in a sample who are classified
    in patterned ways on two variables.
  • Or
  • The number or count of people classified in
    each category created when considering both
    variables at the same time such as
  • White SUV Black SUV
  • White Car Black Car

Race
Vehicle Type
6
Cross Tabs and Chi-Squared
  • Number in Each Joint Group?
  • Why?
  • Means and standard deviations are meaningless for
    nominal variables.
  • So we need statistics that allow us to work
    categorically.

7
Cross Tabs and Chi-Squared
  • The procedure starts with a cross
    classification of the cases in categories of
    each variable.
  • Example
  • Data on male and female support for keeping SJSU
    football from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650

8
Cross Tabs and Chi-Squared
  • In the example, you can see that the campus is
    divided on the issue. But is there an
    association between sex and attitudes?
  • Example
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650

9
Cross Tabs and Chi-Squared
  • But is there an association between sex and
    attitudes?
  • An easy way to get more information is to convert
    the frequencies (or counts in each cell) to
    percentages
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 (41) 200 (44) 65 (14) 450 (99)
  • Male 80 (40) 65 (33) 55 (28) 200 (101)
  • Total 265 (41) 265 (41) 120 (18) 650 (100)
  • percentages d not add to 100 due to rounding

10
Cross Tabs and Chi-Squared
  • We can see that in the sample men are less likely
    to oppose football, but no more likely to say
    yes than womenmen are more likely to say
    maybe
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 (41) 200 (44) 65 (14) 450 (99)
  • Male 80 (40) 65 (33) 55 (28) 200 (101)
  • Total 265 (41) 265 (41) 120 (18) 650 (100)
  • percentages d not add to 100 due to rounding

11
Cross Tabs and Chi-Squared
  • Using percentages to describe relationships is
    valid statistical analysis These are
    descriptive statistics! However, they are not
    inferential statistics.
  • What can we say about the population using this
    sample (inferential statistics)?
  • Thinking about random variations in who would be
    selected from random sample to random sample
  • Could we have gotten sample statistics like
    these from a population where there is no
    association between sex and attitudes about
    keeping football?
  • The Chi-Squared Test of Independence allows us to
    answer questions like those above.

12
Cross Tabs and Chi-Squared
  • The whole idea behind the Chi-Squared test of
    independence is to determine whether the patterns
    of frequencies (or counts) in your cross
    classification table could have occurred by
    chance, or whether they represent systematic
    assignment to particular cells.
  • For example, were women more likely to answer
    no than men or could the deviation in responses
    by sex have occurred because of random sampling
    or chance alone?

13
Cross Tabs and Chi-Squared
  • A number called Chi-Squared, ?2, tells us whether
    the numbers in each cross classification cell in
    our sample deviate from the kind of random
    fluctuations you would get if our two variables
    were not associated with each other (independent
    of each other).
  • Its formula
  • fo observed frequency in each cell fe expected
    frequency in each cell
  • The crux of ?2 is that it gets larger as observed
    data deviate more from the data we would expect
    if our variables were unrelated.
  • From sample to sample, one would expect
    deviations from what is expected even when
    variables are unrelated. But when ?2 gets really
    big it grows beyond the numbers that random
    variation in samples would produce.
  • A big ?2 will imply that there is a relationship
    between our two nominal variables.

?2 ? ((fo - fe)2 / fe)
14
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Calculating ?2 begins with the concept of a
    deviation of observed data from what is expected
    by unrelated variables.
  • Deviation in ?2 Observed frequency Expected
    frequency
  • Observed frequency is just the number of cases in
    each cell of the cross classification table for
    your sample. For example, 185 women said yes,
    they support football at SJSU. 185 is the
    observed frequency.
  • Expected frequency is the number of cases that
    would be in a cell of the cross classification
    table if people in each group of one variable
    were classified in the second variables groups
    in the same ways.

15
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female ? ? ? 450 69.2
  • Male ? ? ? 200 30.8
  • Total 265 265 120 650 100
  • Expected frequency (if our variables were
    unrelated)
  • Females comprise 69.2 of the sample, so wed
    expect 69.2 of the Yes answers to come from
    females, and 69.2 of No and Maybe answers
    to come from females.
  • On the other hand, 30.8 of the Yes, No, and
    Maybe answers should come from Men.
  • Therefore, to calculate expected frequency for
    each cell you do this
  • fe cells row total / table total cells
    column total or
  • fe cells column total / table total cells
    row total
  • The idea 1. Find the percent of persons in one
    category on the first variable then
  • 2. Expect to find that percent of those
    people in each of the other variables
    categories.

16
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female fe1 183.5 fe2 183.5 fe3
    83.1 450 69.2
  • Male fe4 81.5 fe5 81.5 fe6
    36.9 200 30.8
  • Total 265 265 120 650 100
  • Now you know how to calculate the expected
    frequencies
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • and the observed frequencies are obvious

17
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes fo (Yes fe) No fo (No
    fe) Maybe fo (Maybe fe) Total
  • Female 185 (183.5) 200
    (183.5) 65 (83.1)
    450 69.2
  • Male 80 (81.5) 65 (81.5)
    55 (36.9) 200 30.8
  • Total 265 265
    120 650 100
  • You already know how to calculate the deviations
    too.
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1

18
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes fo (Yes fe) No fo (No
    fe) Maybe fo (Maybe fe) Total
  • Female 185 (183.5) 200
    (183.5) 65 (83.1) 450 69.2
  • Male 80 (81.5) 65
    (81.5) 55 (36.9)
    200 30.8
  • Total 265 265
    120 650 100
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1
  • Now, we want to add up the deviations
  • What would happen if we added these deviations
    together?
  • To get rid of negative deviations, we square each
    one (like in computing variance and standard
    deviation).

19
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes fo (Yes fe) No fo (No fe)
    Maybe fo (Maybe fe) Total
  • Female 185 (183.5) 200
    (183.5) 65 (83.1) 450 69.2
  • Male 80 (81.5) 65
    (81.5) 55 (36.9)
    200 30.8
  • Total 265 265
    120 650 100
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1
  • To get rid of negative deviations, we square each
    one (like for variance and standard deviation).
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61

20
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Just how large is each of these squared
    deviations?
  • What do these numbers really mean?
  • Squared Deviations
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61

21
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • The next step is to give the deviations a
    metric. The deviations are compared relative
    to the what was expected. In other words, we
    divide by what was expected.
  • Squared Deviations
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61
  • Youve already calculated what was expected in
    each cell
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9

22
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Squared Deviations
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61
  • Expected Frequencies
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • Relative Deviations-squaredSmall values indicate
    little deviation from what was expected, while
    larger values indicate much deviation from what
    was expected
  • (D1)2 / fe1 2.25 / 183.5 0.012 (D4)2 /
    fe4 2.25 / 81.5 0.028
  • (D2)2 / fe2 272.25 / 183.5 1.484 (D5)2 /
    fe5 272.25 / 81.5 3.340
  • (D3)2 / fe3 327.61 / 83.1 3.942 (D6)2 /
    fe6 327.61 / 36.9 8.878

23
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Relative Deviations-squaredSmall values indicate
    little deviation from what was expected, while
    larger values indicate much deviation from what
    was expected
  • (D1)2 / fe1 2.25 / 183.5 0.012 (D4)2 /
    fe4 2.25 / 81.5 0.028
  • (D2)2 / fe2 272.25 / 183.5 1.484 (D5)2 /
    fe5 272.25 / 81.5 3.340
  • (D3)2 / fe3 327.61 / 83.1 3.942 (D6)2 /
    fe6 327.61 / 36.9 8.878
  • The next step will be to see what the total
    relative deviations-squared are
  • Sum of
  • Relative Deviations-squared 0.012 1.484
    3.942 0.028 3.340 8.878 17.684
  • This number is also what we call Chi-Squared or
    ?2.
  • So
  • Of what good is knowing this number?

?2 ? ((fo - fe)2 / fe)
24
Cross Tabs and Chi-Squared
  • This value, ?2, would form an identifiable shape
    in repeated sampling if the two variables were
    unrelated to each otherthe chance variation
    that we should expect among samples.
  • That shape depends only on the number of rows and
    columns (or the nature of your variables). We
    technically refer to this as the degrees of
    freedom.
  • For ?2, df (rows 1)(columns 1)

25
Cross Tabs and Chi-Squared
  • For ?2, df (rows 1)(columns 1)
  • ?2 distributions

df 5
FYI This should remind you of the normal
distribution, except that, it changes shape
depending on the nature of your variables.
df 10
df 20
df 1
1 5 10 20
26
Cross Tabs and Chi-Squared
Think of the Power!!!!
  • We can use the known properties of the ?2
    distribution to identify the probability that we
    would get our samples ?2 if our variables were
    not related to each other!
  • This is exciting!

27
Cross Tabs and Chi-Squared
  • ?2
  • If my ?2 in a particular analysis were under the
    shaded area or beyond, what could we say about
    the population given our sampleusing a null
    hypothesis that our variables are unrelated?

My Chi-squared
5 of ?2 values
28
Cross Tabs and Chi-Squared
  • ?2
  • Answer Wed reject the null, saying that it is
    highly unlikely that we could get such a large
    chi-squared value from a population where the two
    variables are unrelated.

My Chi-squared
5 of ?2 values
29
Cross Tabs and Chi-Squared
  • ?2
  • So, what does the critical ?2 value equal?

My Chi-squared
5 of ?2 values
30
Cross Tabs and Chi-Squared
  • That depends on the particular problem because
    the distribution changes depending on the number
    of rows and columns in your cross classification
    table.

df 5
df 10
df 20
df 1
?2
1 5 10 20
Critical ?2 s
31
Cross Tabs and Chi-Squared
  • According to Appendix D in Warner,
  • with ?-level .05, if df 1, critical ?2
    3.84
  • df 5, critical ?2 11.07
  • df 10, critical ?2 18.31
  • df 20, critical ?2 31.41

df 5
df 10
df 20
df 1
?2
1 5 10 20
32
Cross Tabs and Chi-Squared
  • In our football problem above, we had a
    chi-squared of 17.68 in a cross classification
    table with 2 rows and 3 columns.
  • Our chi-squared distribution for that table would
    have
  • df (2 1) (3 1) 2. According to
    Appendix D, with ?-level .05, Critical
    Chi-Squared is 5.99.
  • Since 17.68 gt 5.99, we reject the null.
  • We reject that our sample could have come from a
    population where sex was not related to attitudes
    toward football.

My Chi-squared
df 2
5 of ?2 values
?2
5.99 17.68
33
Cross Tabs and Chi-Squared
  • Now lets get formal
  • 7 steps to Chi-squared test of independence
  • Set ?-level (e.g., .05)
  • Find Critical ?2 (depends on df and ?-level)
  • The null and alternative hypotheses
  • Ho The two nominal variables are independent
  • Ha The two variables are dependent on each
    other
  • Collect Data
  • Calculate ?2 ?2 ? ((fo - fe)2 / fe)
  • Make decision about the null hypothesis
  • Report the p-value

34
Cross Tabs and Chi-Squared
  • Afterwards, what have you found?
  • If Chi-Squared is not significant, your variables
    are unrelated.
  • If Chi-Squared is significant, your variables are
    related.
  • Thats All!
  • Chi-Squared cannot tell you anything like the
    strength or direction of association. For purely
    nominal variables, there is no direction of
    association.
  • Chi-Squared is a large-sample test. If dealing
    with small samples, look up appropriate tests. (A
    condition of the test no expected frequency
    lower than 5 in each cell)
  • The larger the sample size, the easier it is for
    Chi-Squared to be significant.
  • 2 x 2 table Chi-Square gives same result as
    Independent Samples t-test for proportion and
    ANOVA.

35
Cross Tabs and Chi-Squared
  • If you want to know how you depart from
    independence, you may
  • Check percentages (conditional distributions) in
    your cross classification table.
  • Do a residual analysis
  • The difference between observed and expected
    counts in a cell behaves like a significance test
    when divided by a standard error for the
    difference.
  • That s.e. ?fe(1-cells row ?)(1 cells
    column ?)
  • fo fe
  • Z s.e.

36
Cross Tabs and Chi-Squared
  • Residual Analysis
  • Lets do cell 5! s.e. ?fe(1-cells row ?)(1
    cells column ?)
  • fo fe 5 row ? 200/650
    .308, column ? 265/650 .408
  • Z s.e. s.e.
    ?81.5 (.692) (.592) 5.78
  • Z 65 81.5 / 5.78 -2.85 2.85 gt 1.96, there
    is a significant difference in cell 5
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5

37
Cross Tabs and Chi-Squared
  • Further topics you could explore
  • Strength of Association
  • Discussing outcomes in terms of difference of
    proportions
  • Reporting Odds Ratios (likelihood of a group
    giving one answer versus other answers or the
    group giving an answer relative to other groups
    giving that answer)
  • Yules Q and Phi for 2x2 tables, ranging from
    -1 to 1, with 0 indicating no relationship and 1
    a strong relationship
  • Strength and Direction of Association for
    Ordinal--not nominal--Variables
  • Gamma (an inferential statistic, so check for
    significance)
  • Ranges from -1 to 1
  • Valence indicates direction of relationship
  • Magnitude indicates strength of relationship
  • Chi-squared and Gamma can disagree when there is
    a nonrandom pattern that has no direction.
    Chi-squared will catch it, gamma wont.
  • Tau c
  • Kendalls tau-b
  • Somers d

38
Cross Tabs and Chi-Squared
  • Controlling for a third variable.
  • One can see the relationship between two
    variables for each level of a third variable.
  • E.g., Sex and Football by Lower or Upper
    Division.
  • Yes No Maybe
  • Upper F
  • M
  • Yes No Maybe
  • Lower F
  • M

39
Cross Tabs and Chi-Squared
  • Sex and Pornlaws

40
Cross Tabs and Chi-Squared
  • Sex and
  • Pornlaw by
  • Sex Education
Write a Comment
User Comments (0)
About PowerShow.com