Title: Model Selections and Comparisons
1Model Selections and Comparisons
(Categorical Data Analysis, Ch 9.2)
Yumi Kubo Alvin Hsieh
Model 2
Model 1
2Survey Data
- 1992 by Wright State University School of
Medicine and United Health Services in Dayton,
Ohio - 2276 students in the last year of high school
(nonurban area) - We add more dimensions to 8.2.4
- Variables Alcohol (A), Cigarette (C), Marijuana
(M) - Added variables Gender (G), Race (R)
3Association Graphs (Definitions)
- association graph - set of vertices, each
vertex is a variable - edge - conditional association between 2
variables - path - sequence of edges leading from one
variable to another
4Association Graphs (Saturated)
Variable
Path
M
G
M
G
R
R
C
A
Conditional Association
5Association Graphs (Reduced)
M
G
R
A
C
6Data Set
Marijuana
Use
Race White
Race Other
Female Male Female
Male Alcohol Cigarette yes no yes no yes no yes
no yes yes 405 268 453 228 23 23 30 19 no 13 218
28 201 2 19 1 18 no yes 1 17 1 17 0 1 1 8 no 1 11
7 1 133 0 12 0 17
7SAS Program
Too large to place here Go to survey.sas
8R Program
Original codes (modified below)
http//math.cl.uh.edu/thompsonla/RCode.txt
surveylt-data.frame(expand.grid(cigarettec("Yes","
No"),
alcoholc("Yes","No"),
marijuanac("Yes","No"),
genderc("female","male"),
racec("white","other")
),
countc(405,13,1,1,268,218,17,117,453
,28,1,1,228,201,17,
133,23,2,0,0,23,19,1,12,30,1,1,0,19,
18,8,17)) library(MASS) fit.GRlt-glm(count .
genderrace, datasurvey, familypoisson)
mutual independence GR fit.homog.assoclt-glm(coun
t .2, datasurvey, familypoisson)
homogeneous association fit.3factlt-glm(count
.3, datasurvey, familypoisson) all three
factor terms summary(reslt-stepAIC(fit.homog.assoc,
scope list(lower cigarette alcohol
marijuana genderrace),
direction"backward")) fit.AC.AM.CM.AG.AR.GM.GR.MR
lt-res fit.AC.AM.CM.AG.AR.GM.GRlt-update(fit.AC.AM.C
M.AG.AR.GM.GR.MR, . - marijuanarace) fit.AC.AM.C
M.AG.AR.GRlt-update(fit.AC.AM.CM.AG.AR.GM.GR, . -
marijuanagender)
9R Program (P-values)
1-pchisq((15.8-15.3),1) 1-pchisq((16.7-15.8),1)
1-pchisq((19.9-16.7),1) 1-pchisq((28.8-19.9),1)
1-pchisq((40.3-28.8),1)
10Model Selection
- Select an Alpha level (default to use 0.05)
- Look at the P-values of the model
- Use (in R) 1-pchisq(G2, df)
- Stop selecting once you reach the Alpha in (1)
- Model 1 GRACMGR
- Model 2 GRACMGR(all pairs)
11Model Selection (Continued)
- Model 3 GRACMGR(all pairs)(all 3 factors)
- Model 4g lowest change in G2, taking out CR
- Model 5 lowest change in G2, taking out CG
- Model 6 lowest change in G2, taking out MR
- Model 7 lowest change in G2, taking out GM
- Consider ACMACAMCM
12Goodness-of-Fit tests(Table 9.2)
13Thank You!
Any Questions???