8.1 Logit Models for Nominal Responses
8.1.2 Alligator Food Choice Example
data gator; input length choice $ @@; cards; 1.24 I 1.30 I 1.30 I 1.32 F 1.32 F 1.40 F 1.42 I 1.42 F 1.45 I 1.45 O 1.47 I 1.47 F 1.50 I 1.52 I 1.55 I 1.60 I 1.63 I 1.65 O 1.65 I 1.65 F 1.65 F 1.68 F 1.70 I 1.73 O 1.78 I 1.78 I 1.78 O 1.80 I 1.80 F 1.85 F 1.88 I 1.93 I 1.98 I 2.03 F 2.03 F 2.16 F 2.26 F 2.31 F 2.31 F 2.36 F 2.36 F 2.39 F 2.41 F 2.44 F 2.46 F 2.56 O 2.67 F 2.72 I 2.79 F 2.84 F 3.25 O 3.28 O 3.33 F 3.56 F 3.58 F 3.66 F 3.68 O 3.71 F 3.89 F ; run;
Table 8.2 on parameter estimates and Figure 8.1. Proc logistic of SAS 8.2 handles generalized logits model very nicely. The option link=glogit specifies that the model is generalized logit model. The option aggregate in the model statement requests a test on the global effect of variable length. In order to produce Figure 8.1, we need to generate predicted probabilities. This is accomplished by using output statement. Figure 8.1 is created using proc gplot.
proc logistic data=gator descending ; model choice (REFERENCE="O") = length / link=glogit scale=none aggregate; output out = prob PREDPROBS=I; run; axis1 label=(a = 90 "Predicted Probability") order = (0 to 1 by .2) minor=none; axis2 label=("Length of Alligator") order = (1 to 4 by 1) minor = none; legend1 label=none value=(h=2 font=swiss 'Other' 'Invertebrates' 'Fish') position=(top right inside) mode=share cborder=black; symbol i = join w=2; proc gplot data = prob; plot (ip_o ip_i ip_f)*length /overlay vaxis=axis1 haxis=axis2 legend=legend1; run; quit;
The LOGISTIC Procedure Model Information Data Set WORK.GATOR Response Variable choice Number of Response Levels 3 Number of Observations 59 Model generalized logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value choice Frequency 1 O 8 2 I 20 3 F 31 Logits modeled use choice='O' as the reference category. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 86 75.1140 0.8734 0.7929 Pearson 86 80.1879 0.9324 0.6563 Number of unique profiles: 45 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 119.142 106.341 SC 123.297 114.651 -2 Log L 115.142 98.341 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 16.8006 2 0.0002 Score 12.5702 2 0.0019 Wald 8.9360 2 0.0115 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq length 2 8.9360 0.0115 Analysis of Maximum Likelihood Estimates Standard Wald Parameter choice DF Estimate Error Chi-Square Pr > ChiSq Intercept I 1 5.6974 1.7938 10.0881 0.0015 Intercept F 1 1.6177 1.3073 1.5314 0.2159 length I 1 -2.4654 0.8997 7.5101 0.0061 length F 1 -0.1101 0.5171 0.0453 0.8314 Odds Ratio Estimates Point 95% Wald Effect choice Estimate Confidence Limits length I 0.085 0.015 0.496 length F 0.896 0.325 2.468
Notice that the same parameter estimates can also be obtained by using proc catmod. We show the code here.
proc catmod data=gator; response logits; direct length; model choice = length ; run; quit;
8.1.4 Belief in Afterlife Example
data afterlife; input race gender belief count; datalines; 1 1 1 371 1 1 2 49 1 1 3 74 1 0 1 250 1 0 2 45 1 0 3 71 0 1 1 64 0 1 2 9 0 1 3 15 0 0 1 25 0 0 2 5 0 0 3 13 ; run;
Table 8.4, Table 8.5 and Table 8.3. After generating the probabilities, we then generate the predicted counts. That is why Table 8.3 comes last.
proc logistic data = afterlife descending; weight count; model belief (reference="3") = race gender /link=glogit scale = none aggregate; output out = prob PREDPROBS=I; run; The LOGISTIC Procedure Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq race 2 2.0824 0.3530 gender 2 7.2074 0.0272 Analysis of Maximum Likelihood Estimates Standard Wald Parameter belief DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -0.7582 0.3614 4.4031 0.0359 Intercept 1 1 0.8828 0.2426 13.2390 0.0003 race 2 1 0.2712 0.3541 0.5863 0.4438 race 1 1 0.3420 0.2370 2.0814 0.1491 gender 2 1 0.1051 0.2465 0.1817 0.6699 gender 1 1 0.4186 0.1713 5.9737 0.0145
proc freq data = prob ; format ip_1-ip_3 f4.2; weight count; tables race*gender*ip_1*ip_2*ip_3/list nocum nopercent out=test ; run;
The FREQ Procedure race gender IP_1 IP_2 IP_3 Frequency --------------------------------------------------- 0 0 0.62 0.12 0.26 43 0 1 0.71 0.10 0.19 88 1 0 0.68 0.12 0.20 366 1 1 0.75 0.10 0.15 494
data table8_3; set test; array p(3) ip_1-ip_3; array pre_count(3); do i = 1 to 3; pre_count(i) = count*p(i); end; drop ip_1-ip_3 i percent; run; proc print data = table8_3 noobs; run;
pre_ pre_ pre_ race gender COUNT count1 count2 count3 0 0 43 26.752 5.1837 11.0648 0 1 88 62.244 8.7615 16.9401 1 0 366 248.245 44.1218 72.9396 1 1 494 372.751 49.1838 72.064
8.2 Cumulative Logit Models for Ordinal Responses
8.2.2 Political Ideology Example
Table 8.6 and parameter estimates.
data ideology; input party ideology count @@; cards; 1 1 80 1 2 81 1 3 171 1 4 41 1 5 55 0 1 30 0 2 46 0 3 148 0 4 84 0 5 99 ; proc logistic data = ideology order=data descending; class party /param = ref; freq count; model ideology = party /link=clogit scale=none ; output out = prob PREDPROBS=I; run; proc freq data = prob noprint; weight count; tables party*ip_1*ip_2*ip_3*ip_4*ip_5 /list nocum nopercent out=test ; run; data table8_6; set test; array p(5) ip_1-ip_5; array pcount(5); do i = 1 to 5; pcount(i) = count*p(i); end; drop ip_1-ip_5 i percent; run; proc print data = table8_6 noobs; run;
Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 58.6451 1 <.0001 Score 57.2448 1 <.0001 Wald 57.0182 1 <.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq party 1 57.0182 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 5 1 -2.0440 0.1188 295.9293 <.0001 Intercept 4 1 -1.2116 0.1031 138.0265 <.0001 Intercept 3 1 0.5000 0.0943 28.1405 <.0001 Intercept 2 1 1.4945 0.1134 173.6781 <.0001 party 0 1 0.9745 0.1291 57.0182 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits party 0 vs 1 2.650 2.058 3.412 party COUNT pcount1 pcount2 pcount3 pcount4 pcount5 0 407 31.7714 44.0346 151.708 75.5005 103.985 1 428 78.4308 83.1523 168.226 49.1170 49.074
8.2.3 Invariance to Choice of Response Categories
8.2.2 Political Ideology Example
Result in this section.
data ideology1; set ideology; if ideology = 1 or ideology = 2 then ideo = 1; else if ideology = 4 or ideology = 5 then ideo = 3; else ideo = 2; run; proc logistic data = ideology1 order=data descending; class party /param = ref; freq count; model ideo = party /link=clogit scale=none ; run;
The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1826.542 1768.834 SC 1835.997 1783.016 -2 Log L 1822.542 1762.834 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 59.7085 1 <.0001 Score 58.5204 1 <.0001 Wald 57.9280 1 <.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq party 1 57.9280 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 3 1 -1.2195 0.1041 137.1879 <.0001 Intercept 2 1 0.4931 0.0951 26.8774 <.0001 party 0 1 1.0059 0.1322 57.9280 <.0001
8.3 Paired-Category Logits for Ordinal Responses
8.3.2 Political Ideology Example Revisited
SAS proc catmod is the procedure to use for adjacent-categories logit models. Here is the syntax for a general adjacent-categories logit model. The response statement below specifies that the model is adjacent-categories logit model.
proc catmod data = ideology; weight count; response alogits; model ideology = party; run; quit;
The syntax for the simpler adjacent-categories model (8.3.2) on page 216 is slightly different. Here is a simple way of doing it. The _RESPONSE_ keyword allows modeling the levels of ideology. The coding for variable party uses simple coding scheme.
proc catmod data = ideology; weight count; response alogits; model ideology = _response_ party ; run; quit;
If we want to dummy code the variable party, we can specify the design matrix directly as in the following example. Notice the sign difference of parameter estimate for variable party from the book on page 217. This is because our party is coded in the opposite way from the book.
proc catmod data = ideology ; weight count; population party; response alogits; model ideology = (1 0 0 0 0, 0 1 0 0 0, 0 0 1 0 0, 0 0 0 1 0, 1 0 0 0 1, 0 1 0 0 1, 0 0 1 0 1, 0 0 0 1 1) (1='Group2/1', 2='Group3/2', 3='Group4/3', 4='Group5/4', 5='party'); run; quit;
The CATMOD Procedure Data Summary Response ideology Response Levels 5 Weight Variable count Populations 2 Data Set IDEOLOGY Total Frequency 835 Frequency Missing 0 Observations 10 Population Profiles Sample party Sample Size ------------------------------ 1 0 407 2 1 428 Response Profiles Response ideology -------------------- 1 1 2 2 3 3 4 4 5 5 Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 5 ----------------------------------------------------------------------------- 1 1 0.42744 1 0 0 0 0 2 1.16857 0 1 0 0 0 3 -0.56640 0 0 1 0 0 4 0.16430 0 0 0 1 0 2 1 0.01242 1 0 0 0 1 2 0.74721 0 1 0 0 1 3 -1.42809 0 0 1 0 1 4 0.29376 0 0 0 1 1 Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------ Group2/1 1 9.82 0.0017 Group3/2 1 109.13 <.0001 Group4/3 1 43.44 <.0001 Group5/4 1 8.32 0.0039 party 1 52.63 <.0001 Residual 3 5.38 0.1459 Analysis of Weighted Least Squares Estimates Standard Chi- Effect Parameter Estimate Error Square Pr > ChiSq -------------------------------------------------------------------- Model 1 0.4368 0.1394 9.82 0.0017 2 1.1710 0.1121 109.13 <.0001 3 -0.7161 0.1087 43.44 <.0001 4 0.3534 0.1225 8.32 0.0039 5 -0.4318 0.0595 52.63 <.0001
8.3.4 Continuation-Ratio Logits
We will use proc catmod in this section. In proc catmod, we can specify the response function using the response statement. Also, we need to pad empty cells in order for proc catmod to perform the parameter estimation successfully. This can be done using option addcell in the model statement.
data toxicity; input con r count; cards; 0 1 15 0 2 1 0 3 281 62.5 1 17 62.5 2 0 62.5 3 225 125 1 22 125 2 7 125 3 283 250 1 38 250 2 59 250 3 202 500 1 144 500 2 132 500 3 9 ; run; proc catmod data = toxicity; weight count; direct con; response 0 1 -1, 1 -.5 -.5 log; model r = con /addcell=.0005; run; quit; The CATMOD Procedure Analysis of Weighted Least Squares Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ------------------------------------------------------------------- Intercept 1 -4.4392 0.3101 204.99 <.0001 2 -1.4280 0.1904 56.26 <.0001 con 1 0.0124 0.00103 144.60 <.0001 2 0.00455 0.000499 83.22 <.0001