Chapter_16 - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Chapter_16

Description:

We know already grouping (dummy) variables. They are variables that describe ... http://i57.photobucket.com/albums/g231/adresaklumea/funny-cats/dance-cat.jp g ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 32

Provided by: iiMet

Category:

more less

Transcript and Presenter's Notes

Title: Chapter_16

1
Chapter_16

Categorical data
Field (2005)?

2
What are categorical variables?

We know already grouping (dummy) variables. They
are variables that describe categories of
entities.
Categorical data are not continuous scores but
discrete values such as being male/female,
pregnant/not pregnant, etc.
With categorical tests, we can test relationships
between such categorical data.

3
Theory of analysing categorical data

In the simplest case, we compare two categorical
variables.
However, we cannot use the mean, since we do not
have a continuous measure. Instead we analyse
frequencies, namely the number of things that
fall into each combination of categories.

4
Example Dancing cats

Research question Can animals
be trained to line-dance?
Experiment 200 cats were trained to line-dance
by rewarding them with food or affection
2 Variables
Training reward through food or affection
Dance could they dance or not?
After training, the number of cats that could or
could not line-dance, was counted and compared
between the two training groups.

http//www.cyber-cats.com/images/cards/dancing-cat
s.jpg
5
Example Dancing cats

Frequencies of cats being able to dance after
food or affectionate reward are tabulated in a
Contingency table

'Contingency' means a fact that is not logically
necessarily true or false. Acts can be
contingent upon other acts, as here dancing or
not dancing is contingent upon food or affection.
We now want to know whether there is a
relationship between the two categorical
variables, namely if the number of dancing cats
relates to the kind of training they received.
6
Pearson's Chi-square test (?2)?

The Chi-square test compares the empirically
observed frequencies against the frequencies that
are expected by chance alone.
The logic of the Chi-square test follows the same
general logic of fitting a model to a set of
data
deviation ?(observed-model)2
?2 ?(observedij-modelij)2
Modelij

'observed' frequencies of data
'model' expected frequencies
7
Calculating expected values

How many cats do we expect to be there in each of
the 4 cells?
Modelij Eij Row Totali x Column Totalj
n
This equation takes into account the unequal
number of cats that received a reward and unequal
number of cats that could or could not dance.

E Expected
8
Expected model values for all 4 cells
RT row total CT column total

ModelFood, Yes (RTYes x CTFood)/n 76 x
38/200 14.44
ModelFood, No (RTNo x CTFood)/n 124 x
38/200 23.56
ModelAffect,Yes (RTYes x CTAffect)/n 76 x
162/200 61.56
ModelAffect,No (RTNo x CTAffect)/n 124 x
162/200 100.44

9
Deriving the ?2 statistic
Expected frequencies
Observed frequencies

We now have to subtract from each observed value
its corresponding model value, square it,divide
it by the model value, and sum it up
?2 (28-14.44)2/14.44 (10-23.56)2/23.56
(48-61.56)2/61.56 (48-61.56)2/n
12.73 7.80 2.99 1.83
23.35

10
The Likelihood ratio
Expected frequencies
Observed frequencies
An alternative to Pearson's ?2 test is the
likelihood ratio statistic which is based on the
maximum-likelihood theory. In its equation, the
natural logarithm ln is used L?2 2? Observed
ij ln (Observedij/Modelij)? 2 28xln(28/14.4
4) 10xln(10/23.56) 48xln(48/61.56) 114xln
(114/100.44) 2(28 x .662) (10 x -.857)
(48 x- .249) (114 x .127)
2(18.54 8.57 11.94 14.44)? 24.94
L?2 is tested for significance in the same way

as ?2.
L?2 yields similar results as ?2 for
large samples. L?2 is preferred
over ?2 for small samples.
11
Yates's continuity correction

For a 2x2 contingency table, Pearson's ?2
statistics may overestimate the correct value,
hence be prone of making a Type 1-error.
Therefore, Yates's correction is applied. It
considers only the absolute of the observed minus
the model values and subtracts -1/2 from it.
?2 ? (Observedij Modelij) - .5)2/Modelij
(13.56-.5)2/14.44 (13.56-.5)2/23.56
(13.56-.5)2/61.56 (13.56-.5)2/100.44
11.81 7.24 2.77 1.7
23.52

Note In most cases, a Yates' correction will n
ot be necessary.
Only for df1 it is advised.
Note The corrected value is somewhat
smaller than the uncorrected one (25.35).
The significance level therefore decreases
a little bit.
12
Assumptions of the ?2 test

What it does NOT assume
Continuous and normally distributed data

What it DOES assume
Each person, item, or entity may only contribute
to a single cell in the contingency table. Hence,
NO repeated testing is possible.
The expected frequencies should be 5. Only in
larger contingency tables 20 of the expected
frequencies may be

13
Testing the ?2 statistic for significance

The observed value ?2 23.35 now has to be
compared against a critical ?2 value which it has
to exceed in order to be significant.
The df's of the ?2 value are (r-1)(c-1) where r
is the number of rows (2) and c the number of
columns (2).
df (2-1)(2-1) 1
Look the critical ?2 value for df1 up in
Appendix A4. It is 3.84 (for p(for p
Since our empirically derived ?2 value (23.35)
exceeds the critical ?2 value (3.84 or 6.63), we
can conclude that the different training methods
have an influence on cats' learning to line-dance.

14
Running a ?2 test using SPSS(using the Cats.sav
and catsWeight.sav data)Entering the data can
be done in two ways

Entering raw data

Entering weight cases

15
Entering raw data Cats.sav

Specify a dummy variable for both variables,
e.g.
Training 0 food reward 1 affection
Dance 0 dances 1 does not
dance
Each row represents one subject.
Expl. a cat that was rewarded with food and did
not learn to dance receives the coding 0 1
This is the way cats.sav is organized

16
Entering weight cases CatsWeight.sav

When we enter the weight cases, we add a third
variable to 'Training' and 'dance', namely
'frequent' with which we encode the frequency
found in the respective cell-combination, e.g.,
28 cats that received food as reward and danced.
Next, we have to tell SPSS that we are using
weight cases.

Data ? Weight cases...
SPSS now weights each category
combination by the number in the
column 'frequent', i.e., it 'knows'
now that there are 28 cases of 00.
17
Running the analysis with cross-tabs

'Crosstabs' is a command that analyses data that
fall into categories.
Analyze ? Descriptive Statistics ? Crosstabs

Enter 1 Variable into 'Row' and
the other into 'Column'.
Enter 1 Variable into 'Row' and
the other into 'Column'.
Here, you may enter a 'layer' variable by which y
ou could split the rows of the table into further

categories.
18
Running the analysis with cross-tabs

You can run the ?2 test either with the weighted
data (CatsWeight.sav) or with the raw data
(Cats.sav). The following specification in the
dialog windows and the subsequent output are the
same.

19
Statistics
There are various options for analyzing categor
ical data.
See next slide
Here you could request tests for ordinal data
20
Statistical options for crosstabs

Chi-square This is the basic Pearson chi-square
test. It detects significant associations between
two categorical variables but it does not tell
you how strong the relation is.
Phi and Cramer's V measure the strength of
association between two categorical variables.
Phi is used for 2x2 contingency tables (2
variables with 2 categories). Cramer's V is
applied when one of the 2 variables has 2
categories. Cramer's V can also be used as a
measure of the effect size.
Lambda Goodman and Kruskal's ? measure how well
one variable can predict membership of another
variable in a category. A value of 1 means that
the variable predicts the other perfectly.
Kendall's statistics for small data sets, good
estimate of the population, also used for ordinal
data (see 4.5.5)?

21
Cells
Besides the 'oberseved' counts
check the 'expected' counts. They
should not be smaller than 5 in a
2x2 contingency table.
Looking at the Row and Column Percentages helps y
ou interpret
the results.
22
Format
If you have 'Exact' instead of 'Format' you can r
equest
an exact test here
The rows are ordered in ascending order
Finally, click OK
23
Output of Crosstabs
In total, 38 of the cats (76) danced.
Of these, 36.8 (28) were trained with
food and 63.2 (48) with affection.
62 of the cats did NOT dance.8.1
(10) of them were trained with food
and 91.9 (114) with affection.
'Count' gives you the absolute numbers
Below are the
within type of training tells us that of
those trained with food, 73.7 danced
and of those trained with affection,
70.4 did not dance.
within type of training tells us that of
those trained with food, 73.7 danced
and of those trained with affection,
70.4 did not dance.

? when food was used cats would dance but when
affection was used they would not.

http//www.cyber-cats.com/images/cards/dancingcat.
jpg
24
Checking the assumptions

All frequencies should be 5.
In the row 'Expected counts' we see that the
smallest count is 14.4 (cats that had food but
did not dance).
Since this value is 5, the assumption of the
Chi-square test has been met.

25
Pearson Chi-square statistic

The Pearson chi-square statistic tests whether
the two variables are independent. If the value
is significant we reject this hypothesis and
instead assume that they are related.

?2 is !
Yates' correction
Interpretation The pattern of responses in the t
wo training conditions is different Cats dan
ce for food (74) but not for love (26)?
When they are trained with affection, only 30
dance but 70 don't.)?
Assumption of ?2 test is fulfilled
26
Additional statistics
Phi valid only for 2x2 tables
Cramer's V when variables have
2 categories. Cramer's V is already an adequate
effect size. Contingency coefficient ensures a
value between 0 and 1.

Those values modify the chi-square statistic and
restrict the range of the test statistic from 0-1
so that they can be interpreted like correlation
coefficients.
The correlation between the 2 variables is medium
(around .35) but highly significant.

27
Calculating effect sizes

A rough measure of the effect size is Cramer's V
which ranges between 0 and 1. It is .356 in our
example.
However, a more common measure of the effect size
is the odds ratio. For a 2x2 contingency table
its interpretation is straightforward.

28
Calculating effect sizes - the odds ratio

Oddsdancing after food
number of cats that had food and danced
number of cats that had food but didn't dance
28/10 2.8
Oddsdancing after affection
number of cats that had affection and danced
number of cats that had affection but didn't
dance
48/114 0.421

29
Calculating effect sizes - the odds ratio

Odds ratio
Oddsdancing after food 2.8/0.421 6.65
Oddsdancing after affection
? If cats were trained with food it was 6.65
times more likely that they would dance as
compared to being trained with affection.

30
Reporting the ?2 test (Field, 2005_694)?

There was a significant association between the
type of training and whether or not cats would
dance ?2 (1) 25.36, p represent the fact that based on the odds ratio
cats were 6.65 times more likely to dance if
trained with food than if trained with affection.