Title: Analysis of polytomous data
1Analysis of polytomous data
- Multinomial regression
- For unordered categories
- Proportional odds model
- For ordered categories
- Nested dichotomies
- For ordered or unordered categories
- Survey question Kermits presence at the
building ceremony sends an inappropriate message
about the Bren Schools professionalism and/or
priorities - Strongly agree
- Agree
- Disagree
- Strongly disagree
- No opinion
- Independent variables specialization, age,
gender, prior professional experience
2Multinomial Regression
- First level of dependent variable is taken as
baseline - Subset data to those cases where response is in
first or second level - Do logistic regression where first level is
failure and second level is success - Repeat with 3rd vs. 1st, 4th vs. 1st, etc.
- Look at statistics of individual coefficients
using Wald statistic - Under null hypothesis that parameter equals zero,
follows a Z distribution - Use AIC to compare to model with no terms
3Re-fitting to get Hessian Call multinom(formula
Specialization Age) Coefficients
(Intercept) Age
Conservation Planning 3.916738
-0.1072449 Corporate Environmental Management
9.238688 -0.3248103 Environmental Economics
Policy 4.563684 -0.1316597 Pollution
Prevention Remediation 4.605002 -0.1511108
Water Resources Management 8.864514
-0.3171930 Std. Errors
(Intercept) Age Conservation
Planning 4.641040 0.1663445
Corporate Environmental Management 6.615423
0.2516384 Environmental Economics Policy
4.806615 0.1735275 Pollution Prevention
Remediation 5.532483 0.2027036 Water
Resources Management 6.922089
0.2640389 Value/SE (Wald statistics)
(Intercept) Age
Conservation Planning 0.8439354
-0.6447156 Corporate Environmental Management
1.3965379 -1.2907819 Environmental Economics
Policy 0.9494589 -0.7587250 Pollution
Prevention Remediation 0.8323572 -0.7454764
Water Resources Management 1.2806124
-1.2013115 Residual Deviance 119.2607 AIC
139.2607
4Call multinom(formula Specialization
1) Coefficients
(Intercept) Water Resources Management
0.9808370 Coastal Marine Resources
Management 0.6931537 Conservation Planning
0.9808370 Corporate Environmental
Management 0.5108322 Environmental
Economics Policy 0.5108322 Std.
Errors
(Intercept) Water Resources Management
0.6770047 Coastal Marine Resources
Management 0.7071083 Conservation Planning
0.6770047 Corporate Environmental
Management 0.7302982 Environmental
Economics Policy 0.7302982 Value/SE
(Wald statistics)
(Intercept) Water Resources Management
1.4487891 Coastal Marine Resources
Management 0.9802653 Conservation Planning
1.4487891 Corporate Environmental
Management 0.6994844 Environmental
Economics Policy 0.6994844 Residual
Deviance 122.0507 AIC 132.0507
5(No Transcript)
6(No Transcript)
7Re-fitting to get Hessian Call multinom(formula
Specialization Age Gender PriorProf
Bren) Coefficients
(Intercept) Age GenderM
PriorProfY BrenEnvironmental school Conservation
Planning 3.009498 -0.1394663
7.633283 1.4662000 0.32303000
Corporate Environmental Management 22.666759
-1.0256491 10.070115 6.4613818
-3.34306122 Environmental Economics Policy
5.670768 -0.2649546 7.220450 2.8906228
-0.09993187 Pollution Prevention
Remediation -12.646108 0.1832633 -6.566742
-0.9361216 9.36869338 Water
Resources Management 2.964847
-0.8953367 8.163449 12.2268411
9.03672971 Std. Errors
(Intercept) Age GenderM
PriorProfY BrenEnvironmental school Conservation
Planning 6.401083 0.2300486
32.8824455 2.176277 2.124468
Corporate Environmental Management 14.541446
0.6292002 32.9088852 3.285172
2.517003 Environmental Economics Policy
6.647332 0.2503504 32.8871401 2.298420
2.132097 Pollution Prevention
Remediation 32.895006 0.3883187 0.4797589
2.498103 30.517606 Water
Resources Management 25.241538
0.4775911 32.8986768 53.793387
29.502362 Value/SE (Wald statistics)
(Intercept) Age
GenderM PriorProfY BrenEnvironmental school
Conservation Planning 0.4701545
-0.6062470 0.2321385 0.6737194
0.15205221 Corporate Environmental Management
1.5587693 -1.6300838 0.3059999 1.9668320
-1.32819098 Environmental Economics
Policy 0.8530893 -1.0583348 0.2195524
1.2576567 -0.04687023 Pollution
Prevention Remediation -0.3844385 0.4719405
-13.6875873 -0.3747329 0.30699307
Water Resources Management 0.1174590
-1.8746931 0.2481391 0.2272926
0.30630529 Residual Deviance 89.0063 AIC
139.0063
8Proportional odds model
- Assume there is a latent variable (unobserved
continuous variable) underlying the ordered
categorical variable - Want to model how the cutpoints vary as a
function of the independent variables - Assumes that slopes of these cutpoints are all
the same - Means there is a linear relationship between the
latent variable and the independent variables
9Re-fitting to get Hessian Call polr(formula
Score.pre Age) Coefficients Value
Std. Error t value Age -0.04679158 0.09721606
-0.4813153 Intercepts Value Std. Error t
value 12 -2.4502 2.6405 -0.9279 23 -1.5489
2.6092 -0.5936 34 0.4870 2.5851
0.1884 Residual Deviance 67.75379 AIC
75.75379
Re-fitting to get Hessian Call polr(formula
Score.pre 1) No coefficients Intercepts
Value Std. Error t value 12 -1.2040 0.4655
-2.5866 23 -0.3102 0.3970 -0.7814 34
1.7047 0.5436 3.1363 Residual Deviance
67.98148 AIC 73.98148
10(No Transcript)
11(No Transcript)
12Re-fitting to get Hessian Call polr(formula
Score.pre Specialization Age Gender
PriorProf Bren) Coefficients
Value
Std. Error t value Specialization Conservation
Planning 3.0449246 1.7092932
1.7813939 Specialization Corporate Environmental
Management 2.6608163 2.0231669
1.3151739 Specialization Environmental Economics
Policy 3.4724096 1.8397514
1.8874341 Specialization Pollution Prevention
Remediation -0.3215878 1.4945305
-0.2151765 Specialization Water Resources
Management 7.5349977 2.5981190
2.9001742 Age
0.1805309 0.1677753
1.0760282 GenderM
-2.8622107 1.3037019
-2.1954487 PriorProfY
-2.0544311 1.4089561
-1.4581229 BrenEnvironmental school
1.9015938 1.3972099
1.3609936 Intercepts Value Std. Error t
value 12 4.3084 5.0014 0.8614 23 5.7559
5.0569 1.1382 34 9.8326 5.3863
1.8255 Residual Deviance 45.75757 AIC
69.75757
13Re-fitting to get Hessian Call polr(formula
Score.pre Specialization Gender PriorProf
Bren) Coefficients
Value Std. Error
t value Specialization Conservation Planning
2.276888 1.574324
1.4462643 Specialization Corporate Environmental
Management 1.483673 1.708301
0.8685081 Specialization Environmental Economics
Policy 2.600238 1.661237
1.5652421 Specialization Pollution Prevention
Remediation -0.544792 1.488118
-0.3660947 Specialization Water Resources
Management 6.451727 2.315680
2.7861049 GenderM
-2.566114 1.233009
-2.0811805 PriorProfY
-1.038788 1.035239
-1.0034280 BrenEnvironmental school
1.075591 1.138760
0.9445280 Intercepts Value Std. Error t
value 12 -0.8579 1.4127 -0.6073 23 0.5354
1.4036 0.3815 34 4.4920 1.8610
2.4138 Residual Deviance 46.93623 AIC
68.93623
14Re-fitting to get Hessian Call polr(formula
Score.pre Specialization Gender) Coefficients
Value Std. Error t
value Specialization Conservation Planning
2.336084258 1.513956
1.543032691 Specialization Corporate
Environmental Management 1.172428456 1.668954
0.702492769 Specialization Environmental
Economics Policy 2.580342108 1.606870
1.605819103 Specialization Pollution Prevention
Remediation 0.004307446 1.353015
0.003183591 Specialization Water Resources
Management 6.372624430 2.206948
2.887527885 GenderM
-2.235615124 1.173919
-1.904403282 Intercepts Value Std. Error
t value 12 -0.6384 1.0808 -0.5907 23
0.6634 1.0800 0.6143 34 4.5121 1.6072
2.8074 Residual Deviance 48.57746 AIC
66.57746
15Nested dichotomies model
- Divide the categories into two groups
- Perform logistic regression to determine how
independent variables affect probability of being
in one group or the other - Within each group, further subdivide into two
subgroups - Perform logistic regression on these subgroups,
conditional on being in the group - Continue until all categories covered
- Combine conditional probabilities to get absolute
probabilities - Sum AIC of all the individual regressions to get
total AIC for the model - The order in which you do the subdivision can
affect your result!
16Call glm(formula agree Age, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept) -3.8565
3.7268 -1.035 0.301 Age 0.1356
0.1422 0.954 0.340 Null deviance
35.426 on 25 degrees of freedom Residual
deviance 34.353 on 24 degrees of freedom AIC
38.353
Call glm(formula Sagree Age, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept) 4.7595
4.9107 0.969 0.332 Age -0.1712
0.1848 -0.927 0.354 Null deviance 15.158
on 10 degrees of freedom Residual deviance
14.000 on 9 degrees of freedom AIC 18.000
Call glm(formula disagree Age, family
binomial) Coefficients Estimate
Std. Error z value Pr(gtz) (Intercept) 2.55392
8.21121 0.311 0.756 Age -0.06013
0.31873 -0.189 0.850 Null deviance 17.397
on 14 degrees of freedom Residual deviance
17.362 on 13 degrees of freedom AIC 21.362
17(No Transcript)
18(No Transcript)
19Call glm(formula agree Specialization Age
Gender PriorProf Bren, family
binomial) Deviance Residuals Min
1Q Median 3Q Max
-1.956e00 -5.205e-01 -5.436e-05 6.536e-01
2.233e00 Coefficients
Estimate Std. Error
z value Pr(gtz) (Intercept)
2.94668 6.92124 0.426
0.6703 Specialization Conservation Planning
-3.61717 2.25435 -1.605
0.1086 Specialization Corporate Environmental
Management -3.66064 2.44932 -1.495
0.1350 Specialization Environmental Economics
Policy -3.04151 2.17319 -1.400 0.1616
Specialization Pollution Prevention
Remediation 0.70340 1.91225 0.368
0.7130 Specialization Water Resources
Management -21.54587 2710.70871 -0.008
0.9937 Age
-0.08057 0.24775 -0.325
0.7450 GenderM
3.43660 1.62201 2.119 0.0341
PriorProfY
2.04312 1.74841 1.169 0.2426
BrenEnvironmental school
-1.76422 1.76192 -1.001 0.3167
--- Signif. codes 0 ' 0.001 ' 0.01 '
0.05 .' 0.1 ' 1 (Dispersion parameter for
binomial family taken to be 1) Null
deviance 35.426 on 25 degrees of
freedom Residual deviance 19.985 on 16 degrees
of freedom AIC 39.985
20Call glm(formula Sagree Specialization Age
Gender PriorProf Bren, family
binomial) Deviance Residuals 1
2 6 7 11
15 16 21 -6.699e-01
1.091e-04 1.091e-04 -6.699e-01 -1.359e00
-7.976e-05 1.805e00 -6.699e-01 22
23 25 1.404e-05 6.699e-01
6.699e-01 Coefficients (1 not defined because
of singularities)
Estimate Std. Error z value
Pr(gtz) (Intercept)
13.2879 10.4988 1.266
0.206 Specialization Conservation Planning
-20.0313 10754.0132 -0.002
0.999 Specialization Corporate Environmental
Management 18.4739 5535.1514 0.003
0.997 Specialization Environmental Economics
Policy -0.9305 2.8418 -0.327
0.743 Specialization Pollution Prevention
Remediation 0.8675 3.2276 0.269
0.788 Age
-0.4574 0.3605 -1.269
0.205 GenderM
NA NA NA
NA PriorProfY
3.1702 3.9370 0.805
0.421 BrenEnvironmental school
-4.5581 4.7576 -0.958
0.338 (Dispersion parameter for binomial family
taken to be 1) Null deviance 15.158 on 10
degrees of freedom Residual deviance 7.351 on
3 degrees of freedom AIC 23.351
21Call glm(formula disagree Specialization
Age Gender PriorProf Bren, family
binomial) Deviance Residuals 3
4 5 8 9
10 12 13 2.107e-08
4.838e-05 4.838e-05 2.107e-08 -9.769e-05
4.838e-05 4.838e-05 1.573e00 14
17 18 19 20
24 26 -6.792e-01 -1.290e00
8.921e-01 7.350e-01 9.769e-05 4.838e-05
-1.184e00 Coefficients
Estimate Std.
Error z value Pr(gtz) (Intercept)
12.5541 25426.6986
4.94e-04 1.000 Specialization Conservation
Planning 18.0183 25426.6921
0.001 0.999 Specialization Corporate
Environmental Management 18.7210 26567.5168
0.001 0.999 Specialization Environmental
Economics Policy -19.6021 23862.4339
-0.001 0.999 Specialization Pollution
Prevention Remediation 0.7027 29728.4674
2.36e-05 1.000 Specialization Water Resources
Management -21.2126 23862.4340
-0.001 0.999 Age
-0.4548 0.8449 -0.538
0.590 GenderM
-18.7210 8780.7132 -0.002
0.998 PriorProfY
1.1166 15970.2772 6.99e-05
1.000 BrenEnvironmental school
18.4732 18225.0010 0.001
0.999 (Dispersion parameter for binomial family
taken to be 1) Null deviance 17.3975 on 14
degrees of freedom Residual deviance 7.3378
on 5 degrees of freedom AIC 27.338
22gt Anova(mod.agree1) Anova Table (Type II
tests) Response agree LR Chisq
Df Pr(gtChisq) Specialization 11.3488 5
0.04489 Age 0.1007 1 0.75097
Gender 5.7131 1 0.01684
PriorProf 1.5172 1 0.21805 Bren
1.0740 1 0.30003
23R code, part 1
- Load some necessary libraries
- library(nnet)
- library(MASS)
- Read in the data, and look at the names of the
variables - ESM lt- read.csv("KermitAttitudes.csv")
- attach(ESM)
- names(ESM)
- Run multinomial regression of Specialization on
Age - esm.mult lt- multinom(Specialization Age)
- summary(esm.mult, corF, WaldT)
- Now run the no effect model for comparison
- esm.mult0 lt- multinom(Specialization 1)
- summary(esm.mult0, corF, WaldT)
24R code, part 2
- Use predict() to generate the probability
curves for each specialization as a function of
age - xx lt- min(Age)max(Age)
- p.fit lt- predict(esm.mult, data.frame(Agexx),
type'probs') - Plot the probability curves. The legend is
placed with the mouse - matplot(xx, p.fit, type'l', xlab"Age",
ylab"Probability", lwd3) - legend(locator(1), lty16, col16, lwd3,
legendlevels(Specialization)) - Create the stacked probability curves and plot
them. The labels are produced with the mouse
(nothing appears until you have clicked all 6
times) - p.cum lt- p.fit
- p.cum,2 lt- p.cum,1 p.cum,2
- p.cum,3 lt- p.cum,2 p.cum,3
- p.cum,4 lt- p.cum,3 p.cum,4
- p.cum,5 lt- p.cum,4 p.cum,5
- p.cum,6 lt- p.cum,5 p.cum,6
- matplot(xx, p.cum, type'l', lwd3, col1, lty1,
xlab"Age", ylab"Probability") - text(locator(6), levels(Specialization))
25R code, part 3
- Fit a more complex multinomial model
- esm.mult1 lt- multinom(Specialization Age
Gender PriorProf Bren) - summary(esm.mult1, corF, WaldT)
- Before we fit models to the opinion scores, we
need to drop the no opinion answer (category 5) - vec lt- Score.pre lt 5
- detach(ESM)
- ESM2 lt- ESMvec,
- attach(ESM2)
- The response variable has to be a factor
- Score.pre lt- as.factor(Score.pre)
- Run the proportional odds model
- esm.polr lt- polr(Score.pre Age)
- summary(esm.polr)
26R code, part 4
- Use predict() to generate the probability
curves for each specialization as a function of
age - xx lt- min(Age)max(Age)
- p.fit lt- predict(esm.polr, data.frame(Agexx),
type'probs') - Plot the probability curves. The legend is
placed with the mouse - matplot(xx, p.fit, type'l', xlab"Age",
ylab"Probability", lwd3) - opinions lt- c("Strongly agree", "Agree",
"Disagree", "Strongly Disagree") - legend(locator(1),lty14,col14,lwd3,legendopi
nions) - Create the stacked probability curves and plot
them. The labels are produced with the mouse
(nothing appears until you have clicked all 6
times) - p.cum lt- p.fit
- p.cum,2 lt- p.cum,1 p.cum,2
- p.cum,3 lt- p.cum,2 p.cum,3
- p.cum,4 lt- p.cum,3 p.cum,4
- matplot(xx, p.cum, type'l', lwd3, col1, lty1,
xlab"Age", ylab"Probability", ylimc(0,1)) - text(locator(4), opinions)
27R code, part 5
- Fit some more complex proportional odds models
- esm.polr1 lt- polr(Score.pre Specialization
Age Gender PriorProf Bren) - summary(esm.polr1)
- esm.polr2 lt- polr(Score.pre Specialization
Gender PriorProf Bren) - summary(esm.polr2)
- esm.polr3 lt- polr(Score.pre Specialization
Gender) - summary(esm.polr3)
- Create the new variables for the dichotomous
analysis - agree lt- recode(Score.pre, "'1' 'yes' '2'
'yes' '3' 'no' '4' 'no'") - Sagree lt- recode(Score.pre, "'1' 'yes' '2'
'no' '3' NA '4' NA") - disagree lt- recode(Score.pre, "'1' NA '2'
NA '3' 'yes' '4' 'no'")
28R code, part 6
- Fit the set of dichotomous models with Age as
the independent variable - mod.agree lt- glm(agree Age, familybinomial)
- summary(mod.agree)
- mod.Sagree lt- glm(Sagree Age, familybinomial)
- summary(mod.Sagree)
- mod.disagree lt- glm(disagree Age,
familybinomial) - summary(mod.disagree)
- Use predict to generate the conditional
probabilities - p.agree lt- predict(mod.agree, data.frame(Agexx),
type"response") - p.Sagree lt- predict(mod.Sagree,
data.frame(Agexx), type"response") - p.disagree lt- predict(mod.disagree,
data.frame(Agexx), type"response") - Combine the conditional probabilities to get
the absolute probabilities - p.SA lt- p.agree p.Sagree
- p.A lt- p.agree (1 - p.Sagree)
- p.D lt- (1 - p.agree) p.disagree
- p.SD lt- (1 - p.agree) (1 - p.disagree)
29R code, part 7
- Plot the probability curves. The legend is
placed with the mouse - plot(xx, p.SA, type'l', lwd3, ylimc(0,1),
xlab"Age", ylab"Probability") - lines(xx, p.A, lty2, lwd3, col2)
- lines(xx, p.D, lty3, lwd3, col3)
- lines(xx, p.SD, lty4, lwd3, col4)
- legend(locator(1),lty14,col14,lwd3,legendopi
nions) - Plot the stacked probability curves
- plot(xx, p.SA, type'l', lwd3, ylimc(0,1),
xlab"Age", ylab"Probability") - lines(xx, p.SA p.A, lwd3)
- lines(xx, p.SA p.A p.D, lwd3)
- text(locator(4), opinions)
- Fit a more complex model
- mod.agree1 lt- glm(agree Specialization Age
Gender PriorProf Bren, familybinomial) - summary(mod.agree1)
- mod.Sagree1 lt- glm(Sagree Specialization Age
Gender PriorProf Bren, familybinomial) - summary(mod.Sagree1)
- mod.disagree1 lt- glm(disagree Specialization
Age Gender PriorProf Bren, familybinomial)