Chapter_16 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Chapter_16

Description:

We know already grouping (dummy) variables. They are variables that describe ... http://i57.photobucket.com/albums/g231/adresaklumea/funny-cats/dance-cat.jp g ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 32
Provided by: iiMet
Category:

less

Transcript and Presenter's Notes

Title: Chapter_16


1
Chapter_16
  • Categorical data
  • Field (2005)?

2
What are categorical variables?
  • We know already grouping (dummy) variables. They
    are variables that describe categories of
    entities.
  • Categorical data are not continuous scores but
    discrete values such as being male/female,
    pregnant/not pregnant, etc.
  • With categorical tests, we can test relationships
    between such categorical data.

3
Theory of analysing categorical data
  • In the simplest case, we compare two categorical
    variables.
  • However, we cannot use the mean, since we do not
    have a continuous measure. Instead we analyse
    frequencies, namely the number of things that
    fall into each combination of categories.

4
Example Dancing cats
  • Research question Can animals
  • be trained to line-dance?
  • Experiment 200 cats were trained to line-dance
    by rewarding them with food or affection
  • 2 Variables
  • Training reward through food or affection
  • Dance could they dance or not?
  • After training, the number of cats that could or
    could not line-dance, was counted and compared
    between the two training groups.

http//www.cyber-cats.com/images/cards/dancing-cat
s.jpg
5
Example Dancing cats
  • Frequencies of cats being able to dance after
    food or affectionate reward are tabulated in a
    Contingency table

'Contingency' means a fact that is not logically
necessarily true or false. Acts can be
contingent upon other acts, as here dancing or
not dancing is contingent upon food or affection.
We now want to know whether there is a
relationship between the two categorical
variables, namely if the number of dancing cats
relates to the kind of training they received.
6
Pearson's Chi-square test (?2)?
  • The Chi-square test compares the empirically
    observed frequencies against the frequencies that
    are expected by chance alone.
  • The logic of the Chi-square test follows the same
    general logic of fitting a model to a set of
    data
  • deviation ?(observed-model)2
  • ?2 ?(observedij-modelij)2
  • Modelij

'observed' frequencies of data
'model' expected frequencies
7
Calculating expected values
  • How many cats do we expect to be there in each of
    the 4 cells?
  • Modelij Eij Row Totali x Column Totalj
  • n
  • This equation takes into account the unequal
    number of cats that received a reward and unequal
    number of cats that could or could not dance.

E Expected
8
Expected model values for all 4 cells
RT row total CT column total
  • ModelFood, Yes (RTYes x CTFood)/n 76 x
    38/200 14.44
  • ModelFood, No (RTNo x CTFood)/n 124 x
    38/200 23.56
  • ModelAffect,Yes (RTYes x CTAffect)/n 76 x
    162/200 61.56
  • ModelAffect,No (RTNo x CTAffect)/n 124 x
    162/200 100.44

9
Deriving the ?2 statistic
Expected frequencies
Observed frequencies
  • We now have to subtract from each observed value
    its corresponding model value, square it,divide
    it by the model value, and sum it up
  • ?2 (28-14.44)2/14.44 (10-23.56)2/23.56
  • (48-61.56)2/61.56 (48-61.56)2/n
  • 12.73 7.80 2.99 1.83
  • 23.35

10
The Likelihood ratio
Expected frequencies
Observed frequencies
An alternative to Pearson's ?2 test is the
likelihood ratio statistic which is based on the
maximum-likelihood theory. In its equation, the
natural logarithm ln is used L?2 2? Observed
ij ln (Observedij/Modelij)? 2 28xln(28/14.4
4) 10xln(10/23.56) 48xln(48/61.56) 114xln
(114/100.44) 2(28 x .662) (10 x -.857)
(48 x- .249) (114 x .127)
2(18.54 8.57 11.94 14.44)? 24.94
L?2 is tested for significance in the same way

as ?2.
L?2 yields similar results as ?2 for
large samples. L?2 is preferred
over ?2 for small samples.
11
Yates's continuity correction
  • For a 2x2 contingency table, Pearson's ?2
    statistics may overestimate the correct value,
    hence be prone of making a Type 1-error.
    Therefore, Yates's correction is applied. It
    considers only the absolute of the observed minus
    the model values and subtracts -1/2 from it.
  • ?2 ? (Observedij Modelij) - .5)2/Modelij
  • (13.56-.5)2/14.44 (13.56-.5)2/23.56
  • (13.56-.5)2/61.56 (13.56-.5)2/100.44
  • 11.81 7.24 2.77 1.7
  • 23.52

Note In most cases, a Yates' correction will n
ot be necessary.
Only for df1 it is advised.
Note The corrected value is somewhat
smaller than the uncorrected one (25.35).
The significance level therefore decreases
a little bit.
12
Assumptions of the ?2 test
  • What it does NOT assume
  • Continuous and normally distributed data
  • What it DOES assume
  • Each person, item, or entity may only contribute
    to a single cell in the contingency table. Hence,
    NO repeated testing is possible.
  • The expected frequencies should be 5. Only in
    larger contingency tables 20 of the expected
    frequencies may be

13
Testing the ?2 statistic for significance
  • The observed value ?2 23.35 now has to be
    compared against a critical ?2 value which it has
    to exceed in order to be significant.
  • The df's of the ?2 value are (r-1)(c-1) where r
    is the number of rows (2) and c the number of
    columns (2).
  • df (2-1)(2-1) 1
  • Look the critical ?2 value for df1 up in
    Appendix A4. It is 3.84 (for p(for p
  • Since our empirically derived ?2 value (23.35)
    exceeds the critical ?2 value (3.84 or 6.63), we
    can conclude that the different training methods
    have an influence on cats' learning to line-dance.

14
Running a ?2 test using SPSS(using the Cats.sav
and catsWeight.sav data)Entering the data can
be done in two ways
  • Entering raw data
  • Entering weight cases

15
Entering raw data Cats.sav
  • Specify a dummy variable for both variables,
    e.g.
  • Training 0 food reward 1 affection
  • Dance 0 dances 1 does not
    dance
  • Each row represents one subject.
  • Expl. a cat that was rewarded with food and did
    not learn to dance receives the coding 0 1
  • This is the way cats.sav is organized

16
Entering weight cases CatsWeight.sav
  • When we enter the weight cases, we add a third
    variable to 'Training' and 'dance', namely
    'frequent' with which we encode the frequency
    found in the respective cell-combination, e.g.,
    28 cats that received food as reward and danced.
  • Next, we have to tell SPSS that we are using
    weight cases.

Data ? Weight cases...
SPSS now weights each category
combination by the number in the
column 'frequent', i.e., it 'knows'
now that there are 28 cases of 00.
17
Running the analysis with cross-tabs
  • 'Crosstabs' is a command that analyses data that
    fall into categories.
  • Analyze ? Descriptive Statistics ? Crosstabs

Enter 1 Variable into 'Row' and
the other into 'Column'.
Enter 1 Variable into 'Row' and
the other into 'Column'.
Here, you may enter a 'layer' variable by which y
ou could split the rows of the table into further

categories.
18
Running the analysis with cross-tabs
  • You can run the ?2 test either with the weighted
    data (CatsWeight.sav) or with the raw data
    (Cats.sav). The following specification in the
    dialog windows and the subsequent output are the
    same.

19
Statistics
There are various options for analyzing categor
ical data.
See next slide
Here you could request tests for ordinal data
20
Statistical options for crosstabs
  • Chi-square This is the basic Pearson chi-square
    test. It detects significant associations between
    two categorical variables but it does not tell
    you how strong the relation is.
  • Phi and Cramer's V measure the strength of
    association between two categorical variables.
    Phi is used for 2x2 contingency tables (2
    variables with 2 categories). Cramer's V is
    applied when one of the 2 variables has 2
    categories. Cramer's V can also be used as a
    measure of the effect size.
  • Lambda Goodman and Kruskal's ? measure how well
    one variable can predict membership of another
    variable in a category. A value of 1 means that
    the variable predicts the other perfectly.
  • Kendall's statistics for small data sets, good
    estimate of the population, also used for ordinal
    data (see 4.5.5)?

21
Cells
Besides the 'oberseved' counts
check the 'expected' counts. They
should not be smaller than 5 in a
2x2 contingency table.
Looking at the Row and Column Percentages helps y
ou interpret
the results.
22
Format
If you have 'Exact' instead of 'Format' you can r
equest
an exact test here
The rows are ordered in ascending order
Finally, click OK
23
Output of Crosstabs
In total, 38 of the cats (76) danced.
Of these, 36.8 (28) were trained with
food and 63.2 (48) with affection.
62 of the cats did NOT dance.8.1
(10) of them were trained with food
and 91.9 (114) with affection.
'Count' gives you the absolute numbers
Below are the
within type of training tells us that of
those trained with food, 73.7 danced
and of those trained with affection,
70.4 did not dance.
within type of training tells us that of
those trained with food, 73.7 danced
and of those trained with affection,
70.4 did not dance.
  • ? when food was used cats would dance but when
    affection was used they would not.

http//www.cyber-cats.com/images/cards/dancingcat.
jpg
24
Checking the assumptions
  • All frequencies should be 5.
  • In the row 'Expected counts' we see that the
    smallest count is 14.4 (cats that had food but
    did not dance).
  • Since this value is 5, the assumption of the
    Chi-square test has been met.

25
Pearson Chi-square statistic
  • The Pearson chi-square statistic tests whether
    the two variables are independent. If the value
    is significant we reject this hypothesis and
    instead assume that they are related.

?2 is !
Yates' correction
Interpretation The pattern of responses in the t
wo training conditions is different Cats dan
ce for food (74) but not for love (26)?
When they are trained with affection, only 30
dance but 70 don't.)?
Assumption of ?2 test is fulfilled
26
Additional statistics
Phi valid only for 2x2 tables
Cramer's V when variables have
2 categories. Cramer's V is already an adequate
effect size. Contingency coefficient ensures a
value between 0 and 1.
  • Those values modify the chi-square statistic and
    restrict the range of the test statistic from 0-1
    so that they can be interpreted like correlation
    coefficients.
  • The correlation between the 2 variables is medium
    (around .35) but highly significant.

27
Calculating effect sizes
  • A rough measure of the effect size is Cramer's V
    which ranges between 0 and 1. It is .356 in our
    example.
  • However, a more common measure of the effect size
    is the odds ratio. For a 2x2 contingency table
    its interpretation is straightforward.

28
Calculating effect sizes - the odds ratio
  • Oddsdancing after food
  • number of cats that had food and danced

  • number of cats that had food but didn't dance
  • 28/10 2.8
  • Oddsdancing after affection
  • number of cats that had affection and danced

  • number of cats that had affection but didn't
    dance
  • 48/114 0.421

29
Calculating effect sizes - the odds ratio
  • Odds ratio
  • Oddsdancing after food 2.8/0.421 6.65
  • Oddsdancing after affection
  • ? If cats were trained with food it was 6.65
    times more likely that they would dance as
    compared to being trained with affection.

30
Reporting the ?2 test (Field, 2005_694)?
  • There was a significant association between the
    type of training and whether or not cats would
    dance ?2 (1) 25.36, p represent the fact that based on the odds ratio
    cats were 6.65 times more likely to dance if
    trained with food than if trained with affection.

31
  • If cats don't want to line-dance after affection,
    maybe they would break-dance?!

http//i57.photobucket.com/albums/g231/adresaklume
a/funny-cats/dance-cat.jpg
Write a Comment
User Comments (0)
About PowerShow.com