to 2 and beyond - PowerPoint PPT Presentation

1 / 71

About This Presentation

Title:

to 2 and beyond

Description:

Goodness of fit to compare single variables with a distribution. ?2 goodness of fit ... the participant and asked them to identify the confederate from a lineup. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 72

Provided by: DanWr7

Category:

more less

Transcript and Presenter's Notes

Title: to 2 and beyond

1
to ?2 and beyond

Distribution
Lots of tests produce statistics that are
compared with the ?2 distribution
Goodness of fit to compare single variables with
a distribution
?2 goodness of fit
Kolmogorov-Smirnov
Shapiro-Wilk
Tests of association between two variables

2
Ratbert correct on 58 of the 100. Is this
different from chance?
3
Pearson ?2 statistic A measure of deviation
4

Is this high enough to reject the model of no
ESP?
Need to know the residual degrees of freedom.
Number of non-redundant pieces of information
minus the number of pieces of information used in
model.
Two pieces of information. The model uses one (n
used to calculate expected values), leaving one
for the residual.
?2 table uses both the deviation value and the
degrees of freedom.

5
?2 distribution
6
A Second Equation

The likelihood ratio ?2 statistic
Plugging in the numbers for this example yields
Also non-significant
L?2 can be partitioned like the sums of squares
(SS) in ANOVA and regression models.

7
(No Transcript)
8
http//glass.ed.asu.edu/stats/analysis/
9
The Confidence Interval (Wald)
10

Wilson (1927) approach thought of as best.

11
Men Winning More Awards

A few good men
Johnson, Carothers Deary (Nov, PPS) "Role of
the X Chromosome"
psychologicalscience.org/journals/pps/4_6_inpress
/Johnson_final.pdf
Wendy Northcutts (2000) approach to evolution
Survival of the fittest only means that best
genes remain in pool.
Pool may still contain bad genes which just don't
get passed onto the next gene pool (of course,
one genetically damaged plant if propagated could
destroy a field, or one prolific war monger).
131 men won Darwin Awards, but only 24 women.
Odds is 131/24 5.46!

12
Repeat Odds

131 men won a Darwin award
24 women won one
Proportion of men winning is 131/1550.85
Odds is 131/24 5.46.
Odds are important for some statistics.
In a second, as the odds ratio.
Next term, as part of regression when the
response variable is binary.

13
Two Binary Variables Own Race Bias

A Black confederate approached Black and White
participants in South Africa. A few minutes
later an RA approached the participant and asked
them to identify the confederate from a lineup.
17 of the 25 Blacks (68) correct, odds 2.125
8 of the 25 Whites (32) correct, odds 0.471
The odds ratio (OR) is 2.125/0.471 4.52.
The odds ratio is a common measure of
association, like the correlation, but for 2x2
tables.

14
(No Transcript)
15

install.packages("sdtalt")
library(sdtalt)
sdt(17,8,8,17)

16
(No Transcript)
17
Calculating Asymptotic CI (called the Wald CIs)

1. Take the ln of the observed OR. Here,
ln(4.516)1.507.
2. Calculate the standard error on the log odds
ratio
Calculate the 95 confidence interval of ln OR
lb ln OR - 1.96 se(ln OR) 1.507 - 1.96(0.606)
0.319
ub ln OR 1.96 se(ln OR) 1.507 1.96(0.606)
2.69
4. Back-transform these into odds ratios
exp(0.319) e0.319 1.376 and exp(2.69)
e2.69 14.80

18
Or write a function or find onehttps//home.comca
st.net/lthompson221/RCode.txt
19
Differences due to rounding (these estimates are
more accurate)
20
Calculating ?2

E11 RT1 CT1 /n
12.5 (25) (25) /50
Eij RTi CTj /n
ln (Eij) ln(RTi) ln(CTj) - ln (n)

21
?2 (1) 6.48, p .011
Pearson originally got the df wrong rc - 1
rather than (r-1)(c-1). Fisher corrected him. ?2
(1) 6.48, p .011
22
Graphing Chi-Square residuals

When the table is larger than 2x2, the Chi-Square
value does not tell you where the effect is.
Two approaches
Residual statistics
Correspondence analysis

23
Two variables Multiple values(based on a UK
national survey)
24
Degrees of Freedom

16 non-redundant pieces of information
Equal numbers in all cells (one number)
Accounting for column totals (3 additional)
Accounting for column and row total (3
additional)
7 df used in the model
Leaving 9 df for the residuals. The ?2 test is
of the residuals so use this for looking up.

L?2 30.56
degrees of freedom (r-1)(c-1) 9 or calculate
from 16-1-3-39
Critical ?2 values for df9 are 16.92 and 21.67
for a equal .05 and .01 respectively.
So, an association has been detected.

26
(No Transcript)
27
(No Transcript)
28
Pearson residuals (O-E)/sqrt(E)
29
Association

Find odds ratio for each 2x2 comparison.
Useful if both variables are ordinal.

1.13
1.61
0.91
2.53
0.30
3.07
0.39
1.92
0.51
30
How big is the association?

Can look at odds ratios of each 2x2 comparison.
Cramers V (and V2)
SPSS gives lots of effect sizes. Cohen uses

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Where is the association? O-E Residuals
35
Biggest residuals for married, but it is also the
largest row total.
36
Standardized or Pearson's Residuals

Square root of each cells contribution to the
overall ?2

37
Now widowed has the largest, but it has a small
row total.
38
Correspondence Analysis(briefly and without math)

Partitions the residual ?2 left over after the
no association model
If there are r rows and c columns, the program
can use up to either (r-1) or (c-1) dimensions,
whichever is smaller.
Important Think about the size of association
Analyze/Data Reduction/Correspondence Analysis in
SPSS
with corresp in MASS package for R (lots of
others, too)

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
?2 30 (is social class ordinal?)
43
(No Transcript)
44
(No Transcript)
45
Lots of output minimum here
15 for 1st dimension, 9 2nd, and 1.5 for 3rd
46
Does for class also.
47
Schuman Scott (1989) Events by Age
48
Three Variables

There are techniques designed for multiple
variables.
Can use CA.
Compute new variable
Compute newvar age 6 gender
1 2 3 4 5 6 1
2
7 8 9 10 11 12 13 14 15
16 17 18
CA comes to mind with newvar
Biplot and drawing lines

49
Let me repeat that

Compute new variable
Compute newvar age 6 gender
1 2 3 4 5 6 1
2
7 8 9 10 11 12 13 14 15
16 17 18

Make sure it works
50
(No Transcript)
51
(No Transcript)
52
Heath et al. on social class and voting
53
Summary

Finding a significant Chi-square does not tell
you where the effect lies.
Graph different residuals and look at
correspondence analysis

54
Square Tables (if time allows)

The rows and columns have the same values.
Inter-rater reliability
Two similar measures (two tests, two trials)
Before-After studies
Matched participants (eg., fathers/sons
political party preference or social class)
Can have more than 2 variables, but complexity
increases.

55
Special Models of Interest

Equi-Probable (every cell equally likely)
Independence (like what weve been doing)
Quasi-independence
Symmetry
Quasi-symmetry

56
Square Tables Inter-rater reliability

Suppose a researcher was interested in the
reliability of exam markers.
Suppose there are first and second exam markers
of a large number of exercises.
Questions Are they reliable? Is one harsher
than the other? Are non-agreements random?

57
Contingency Table
Marker 1
58
.067
6.25
2.40
1.00
0.40
2.50
16.67
1.00
0.12
59
Equi-Probable w/ Diagonal Eij n/16
Marker 1
16 Cells. 1 used, 15 left. X2(15) 1187
60
Equi-Probable w/o Diagonal Eij n/12
Marker 1
12 Cells. 1 used, 11 left. X2(11) 660
can also fit the diagonal, using those 4 df.
61
Independence w/o Diagonal Eij RTi CTj /n
Marker 1
12 Cells. 7 used, 5 left. X2(5) 58, p lt .001
62
Symmetry w/o Diagonal Eij Eii
Marker 1
12 Cells. 6 used, 6 left. X2(6) 149
63
Quasi-Symmetry w/o Diagonal Eij Eii but
taking into account marginals
Marker 1
12 Cells. 9 used, 3 left. X2(3) 9.68 (p .02)
64
Summary of Models

Equi-probable with all data X2(15) 1187
Equiprobably w/o diagonal X2(11) 660
Add marginals Independence X2(5) 58
Symmetry X2(6) 149
Quasi-symmetry X2(3) 10
Still significant, but small enough to accept
Shows marginals different. Marker 1 is tougher.

65
Quasi-Symmetry fits pretty wellResiduals and
standardized residuals
Marker 1
12 Cells. 9 used, 3 left. X2(3) 9.68 (p
.02) Marker 2 gives As to many that Marker 1
thinks are poor.
66
Taking into account the ordinality

Assume independence as the baseline.
Include additional parameters to account for the
association.
Linear by linear
RC model (Correspondence Analysis)

67
Independence Eij RTi CTj / n
Marker 1
16 Cells. 7 used, 9 left. X2(9) 605
68
Linear by Linear Term

Eij RTi CTj / n
ln Eij ln RTi ln CTj - ln n
It is significant, but 9 df we have not located
where the association is.
If scores are interval, then including a term
marker1marker2 tests for linear association.
In SPSS, put interaction in as covariate and in
model (with the two main effects).

69
Linear by Linear Model
Marker 1
16 Cells. 8 used, 8 left. X2(9) 127
70
Association Models

Sometimes called uniform association model as all
the local odds ratios are the same.
Assume interval Row and Column values
R models relax this for rows
C models relax this for columns
R C models for both
RC(M) models and correspondence analysis

71
Journal