to 2 and beyond - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

to 2 and beyond

Description:

Goodness of fit to compare single variables with a distribution. ?2 goodness of fit ... the participant and asked them to identify the confederate from a lineup. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 72
Provided by: DanWr7
Category:
Tags: beyond | lineup

less

Transcript and Presenter's Notes

Title: to 2 and beyond


1
to ?2 and beyond
  • Distribution
  • Lots of tests produce statistics that are
    compared with the ?2 distribution
  • Goodness of fit to compare single variables with
    a distribution
  • ?2 goodness of fit
  • Kolmogorov-Smirnov
  • Shapiro-Wilk
  • Tests of association between two variables

2
Ratbert correct on 58 of the 100. Is this
different from chance?
3
Pearson ?2 statistic A measure of deviation
4
  • Is this high enough to reject the model of no
    ESP?
  • Need to know the residual degrees of freedom.
  • Number of non-redundant pieces of information
    minus the number of pieces of information used in
    model.
  • Two pieces of information. The model uses one (n
    used to calculate expected values), leaving one
    for the residual.
  • ?2 table uses both the deviation value and the
    degrees of freedom.

5
?2 distribution
6
A Second Equation
  • The likelihood ratio ?2 statistic
  • Plugging in the numbers for this example yields
  • Also non-significant
  • L?2 can be partitioned like the sums of squares
    (SS) in ANOVA and regression models.

7
(No Transcript)
8
http//glass.ed.asu.edu/stats/analysis/
9
The Confidence Interval (Wald)
10
  • Wilson (1927) approach thought of as best.

11
Men Winning More Awards
  • A few good men
  • Johnson, Carothers Deary (Nov, PPS) "Role of
    the X Chromosome"
  • psychologicalscience.org/journals/pps/4_6_inpress
    /Johnson_final.pdf
  • Wendy Northcutts (2000) approach to evolution
    Survival of the fittest only means that best
    genes remain in pool.
  • Pool may still contain bad genes which just don't
    get passed onto the next gene pool (of course,
    one genetically damaged plant if propagated could
    destroy a field, or one prolific war monger).
  • 131 men won Darwin Awards, but only 24 women.
  • Odds is 131/24 5.46!

12
Repeat Odds
  • 131 men won a Darwin award
  • 24 women won one
  • Proportion of men winning is 131/1550.85
  • Odds is 131/24 5.46.
  • Odds are important for some statistics.
  • In a second, as the odds ratio.
  • Next term, as part of regression when the
    response variable is binary.

13
Two Binary Variables Own Race Bias
  • A Black confederate approached Black and White
    participants in South Africa. A few minutes
    later an RA approached the participant and asked
    them to identify the confederate from a lineup.
  • 17 of the 25 Blacks (68) correct, odds 2.125
  • 8 of the 25 Whites (32) correct, odds 0.471
  • The odds ratio (OR) is 2.125/0.471 4.52.
  • The odds ratio is a common measure of
    association, like the correlation, but for 2x2
    tables.

14
(No Transcript)
15
  • install.packages("sdtalt")
  • library(sdtalt)
  • sdt(17,8,8,17)

16
(No Transcript)
17
Calculating Asymptotic CI (called the Wald CIs)
  • 1. Take the ln of the observed OR. Here,
    ln(4.516)1.507.
  • 2. Calculate the standard error on the log odds
    ratio
  • Calculate the 95 confidence interval of ln OR
  • lb ln OR - 1.96 se(ln OR) 1.507 - 1.96(0.606)
    0.319
  • ub ln OR 1.96 se(ln OR) 1.507 1.96(0.606)
    2.69
  • 4. Back-transform these into odds ratios
  • exp(0.319) e0.319 1.376 and exp(2.69)
    e2.69 14.80

18
Or write a function or find onehttps//home.comca
st.net/lthompson221/RCode.txt
19
Differences due to rounding (these estimates are
more accurate)
20
Calculating ?2
  • E11 RT1 CT1 /n
  • 12.5 (25) (25) /50
  • Eij RTi CTj /n
  • ln (Eij) ln(RTi) ln(CTj) - ln (n)

21
?2 (1) 6.48, p .011
Pearson originally got the df wrong rc - 1
rather than (r-1)(c-1). Fisher corrected him. ?2
(1) 6.48, p .011
22
Graphing Chi-Square residuals
  • When the table is larger than 2x2, the Chi-Square
    value does not tell you where the effect is.
  • Two approaches
  • Residual statistics
  • Correspondence analysis

23
Two variables Multiple values(based on a UK
national survey)
24
Degrees of Freedom
  • 16 non-redundant pieces of information
  • Equal numbers in all cells (one number)
  • Accounting for column totals (3 additional)
  • Accounting for column and row total (3
    additional)
  • 7 df used in the model
  • Leaving 9 df for the residuals. The ?2 test is
    of the residuals so use this for looking up.

25
  • L?2 30.56
  • degrees of freedom (r-1)(c-1) 9 or calculate
    from 16-1-3-39
  • Critical ?2 values for df9 are 16.92 and 21.67
    for a equal .05 and .01 respectively.
  • So, an association has been detected.

26
(No Transcript)
27
(No Transcript)
28
Pearson residuals (O-E)/sqrt(E)
29
Association
  • Find odds ratio for each 2x2 comparison.
  • Useful if both variables are ordinal.

1.13
1.61
0.91
2.53
0.30
3.07
0.39
1.92
0.51
30
How big is the association?
  • Can look at odds ratios of each 2x2 comparison.
  • Cramers V (and V2)
  • SPSS gives lots of effect sizes. Cohen uses

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Where is the association? O-E Residuals
35
Biggest residuals for married, but it is also the
largest row total.
36
Standardized or Pearson's Residuals
  • Square root of each cells contribution to the
    overall ?2

37
Now widowed has the largest, but it has a small
row total.
38
Correspondence Analysis(briefly and without math)
  • Partitions the residual ?2 left over after the
    no association model
  • If there are r rows and c columns, the program
    can use up to either (r-1) or (c-1) dimensions,
    whichever is smaller.
  • Important Think about the size of association
  • Analyze/Data Reduction/Correspondence Analysis in
    SPSS
  • with corresp in MASS package for R (lots of
    others, too)

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
?2 30 (is social class ordinal?)
43
(No Transcript)
44
(No Transcript)
45
Lots of output minimum here
15 for 1st dimension, 9 2nd, and 1.5 for 3rd
46
Does for class also.
47
Schuman Scott (1989) Events by Age
48
Three Variables
  • There are techniques designed for multiple
    variables.
  • Can use CA.
  • Compute new variable
  • Compute newvar age 6 gender
  • 1 2 3 4 5 6 1
    2
  • 7 8 9 10 11 12 13 14 15
    16 17 18
  • CA comes to mind with newvar
  • Biplot and drawing lines

49
Let me repeat that
  • Compute new variable
  • Compute newvar age 6 gender
  • 1 2 3 4 5 6 1
    2
  • 7 8 9 10 11 12 13 14 15
    16 17 18

Make sure it works
50
(No Transcript)
51
(No Transcript)
52
Heath et al. on social class and voting
53
Summary
  • Finding a significant Chi-square does not tell
    you where the effect lies.
  • Graph different residuals and look at
    correspondence analysis

54
Square Tables (if time allows)
  • The rows and columns have the same values.
  • Inter-rater reliability
  • Two similar measures (two tests, two trials)
  • Before-After studies
  • Matched participants (eg., fathers/sons
    political party preference or social class)
  • Can have more than 2 variables, but complexity
    increases.

55
Special Models of Interest
  • Equi-Probable (every cell equally likely)
  • Independence (like what weve been doing)
  • Quasi-independence
  • Symmetry
  • Quasi-symmetry

56
Square Tables Inter-rater reliability
  • Suppose a researcher was interested in the
    reliability of exam markers.
  • Suppose there are first and second exam markers
    of a large number of exercises.
  • Questions Are they reliable? Is one harsher
    than the other? Are non-agreements random?

57
Contingency Table
Marker 1
58
.067
6.25
2.40
1.00
0.40
2.50
16.67
1.00
0.12
59
Equi-Probable w/ Diagonal Eij n/16
Marker 1
16 Cells. 1 used, 15 left. X2(15) 1187
60
Equi-Probable w/o Diagonal Eij n/12
Marker 1
12 Cells. 1 used, 11 left. X2(11) 660
can also fit the diagonal, using those 4 df.
61
Independence w/o Diagonal Eij RTi CTj /n
Marker 1
12 Cells. 7 used, 5 left. X2(5) 58, p lt .001
62
Symmetry w/o Diagonal Eij Eii
Marker 1
12 Cells. 6 used, 6 left. X2(6) 149
63
Quasi-Symmetry w/o Diagonal Eij Eii but
taking into account marginals
Marker 1
12 Cells. 9 used, 3 left. X2(3) 9.68 (p .02)
64
Summary of Models
  • Equi-probable with all data X2(15) 1187
  • Equiprobably w/o diagonal X2(11) 660
  • Add marginals Independence X2(5) 58
  • Symmetry X2(6) 149
  • Quasi-symmetry X2(3) 10
  • Still significant, but small enough to accept
  • Shows marginals different. Marker 1 is tougher.

65
Quasi-Symmetry fits pretty wellResiduals and
standardized residuals
Marker 1
12 Cells. 9 used, 3 left. X2(3) 9.68 (p
.02) Marker 2 gives As to many that Marker 1
thinks are poor.
66
Taking into account the ordinality
  • Assume independence as the baseline.
  • Include additional parameters to account for the
    association.
  • Linear by linear
  • RC model (Correspondence Analysis)

67
Independence Eij RTi CTj / n
Marker 1
16 Cells. 7 used, 9 left. X2(9) 605
68
Linear by Linear Term
  • Eij RTi CTj / n
  • ln Eij ln RTi ln CTj - ln n
  • It is significant, but 9 df we have not located
    where the association is.
  • If scores are interval, then including a term
    marker1marker2 tests for linear association.
  • In SPSS, put interaction in as covariate and in
    model (with the two main effects).

69
Linear by Linear Model
Marker 1
16 Cells. 8 used, 8 left. X2(9) 127
70
Association Models
  • Sometimes called uniform association model as all
    the local odds ratios are the same.
  • Assume interval Row and Column values
  • R models relax this for rows
  • C models relax this for columns
  • R C models for both
  • RC(M) models and correspondence analysis

71
Journal
  • Be prepared to share your thoughts about your
    presentation at the next lecture.
  • First Steps 10.2, 10.3
Write a Comment
User Comments (0)
About PowerShow.com