Section 15.1 Models for Dichotomous Data
Figure 15.1. using data file chile.
data chilenew; /*jittered dataset*/
set chile;
if (intvote = 2) then voting = 0;
else voting = 1;
vjitter = voting + 0.05*ranuni(0);
if(intvote <3); /*1 for and 2 against*/
output;
run;
proc logistsic data=chilenew descending noprint; /*descending option is needed*/
model voting=statquo;
output out=chile1 predicted=lpred; /*output data file for further analysis*/
run;
proc loess data=chile1;
model voting=statquo;
ods output PredAtVertices=chile2;
run;
proc sort data=chile2;
by statquo;
proc sort data=chile1;
by statquo;
run;
data chile3;
merge chile1 chile2;
by statquo;
run;
goptions gsfname=outfiles dev=gif373;
symbol1 c=black i=rls v=circle h=0.5;
symbol2 c=red i=join v=none;
symbol3 c=blue i=join v=none h=0.5 line=2;
title 'Figure 15.1';
filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp15Fig1.gif';
axis1 order = (-2 to 2 by 1);
axis2 order = (-0.1 to 1.1 by 0.6) label=(r=0 a=90);;
proc gplot data=chile3;
format statquo f2.0;
plot vjitter*statquo=1 lpred*statquo=2 Pred*statquo=3 / haxis=axis1
vaxis=axis2 overlay vminor=0 hminor=0;
label statquo='Support for the Status Quo';
label vjitter='Voting Intention';
run;
quit;
Table 15.1 and Table 15.2 on page 452 using data file womenlf. In order to use proc logistic correctly, dummy variables are created to present the status of working, the presence of children and the region. The deviance in the table is -2*Log L which is presented in the output. The (G0)2 in Table 15.2 is the difference of -2*Log L between the two models being compared. The p-value is then obtained using chi-square distribution with the corresponding degree of freedom. On the other hand, proc logistic uses Wald-test (shown below) instead of (G0)2.
data wcoded; set womenlf; if (work='not_work') then wstatus=0; else wstatus =1; if (child ='present') then kid = 1; else kid =0; if (region ='Ontario') then re1= 1; else re1 =0; if (region ='Prairie') then re2=1; else re2=0; if (region ='Atlantic') then re3=1; else re3=0; if (region = 'Quebec') then re4=1; else re4=0; ik=husbinc*kid; run; proc logistic data=wcoded;/*model 1*/ model wstatus = husbinc kid re1-re4 ik; test2: test ik;/* model contrast:2-1*/ test5: test re1, re2, re3, re4;/*model contrast : 3-1*/ run; proc logistic data=wcoded; /*model 2*/ model wstatus = husbinc kid re1-re4; test3: test husbinc;/*model contrast: 5-2*/ test4: test kid; /*model contrast: 4-2 */ run; proc logistic data=wcoded; /*model 3*/ model wstatus = husbinc kid ik; run; proc logistic data=wcoded; /*model 4*/ model wstatus = husbinc re1-re4; run; proc logistic data=wcoded;/*model 5*/ model wstatus = kid re1-re4; run;
The LOGISTIC Procedure ---Model 1---
(Output omitted here.)
Model Fit Statistics For Model 1
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 332.542
SC 361.723 361.119
-2 Log L 356.151 316.542
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 39.6094 7 <.0001
Score 38.6566 7 <.0001
Wald 33.8338 7 <.0001
Linear Hypotheses Testing Results
Wald
Label Chi-Square DF Pr > ChiSq
TEST2 0.7547 1 0.3850 --comparing model 2 with model 1
TEST5 2.5477 4 0.6361 --comparing model 3 with model 1
The LOGISTIC Procedure ---Model 2---
(Output omitted here.)
Model Fit Statistics For Model 2
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 331.301
SC 361.723 356.306
-2 Log L 356.151 317.301
Linear Hypotheses Testing Results
Wald
Label Chi-Square DF Pr > ChiSq
TEST3 4.8567 1 0.0275 --comparing model 5 and model 2
TEST4 28.2452 1 <.0001 --comparing model 4 and model 2
The LOGISTIC Procedure ---Model 3---
(Output omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 327.124
SC 361.723 341.413
-2 Log L 356.151 319.124
The LOGISTIC Procedure ---Model 4---
(Output omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 359.849
SC 361.723 381.282
-2 Log L 356.151 347.849
The LOGISTIC Procedure ---Model 5---
(Output omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 334.427
SC 361.723 355.860
-2 Log L 356.151 322.427
Figure 15.4 on page 453.
proc logistic data= wcoded descending noprint;
model wstatus = kid husbinc /rsquare;
output out=women predicted=pred;
run;
proc reg data=women noprint;
model wstatus = husbinc kid;
output out = women1 predicted=lpr;
run;
data women2; /*to create a dataset for gplot */
set women1;
if (kid=1) then do l1=lpr; p1=pred; end;
else do l2=lpr; p2=pred; end;
output;
run;
proc sort data=women2;
by husbinc;
run;
data labels; /*for the text in the graph*/
length function style $ 8 text $17;
retain function 'label' xsys ysys '2' when 'a' color 'black';
function='label'; x=10; y=0.15; size=1; color='black'; text='Children Present'; output;
function='label'; x=35; y=0.75; size=1; color='black'; text='Children Absent'; output;
run;
title 'Figure 15.4';
symbol1 c=black i=join v=none h=1 line=2;
symbol2 c=blue i=join v=none h=1 line=1;
axis1 order =(0 to 50 by 10);
axis2 order=(0 to 1 by .25) label=(r=0 a=90);
proc gplot data=women2;
plot l1*husbinc=1 l2*husbinc=1 p1*husbinc=2 p2*husbinc=2
/ annotate=labels haxis=axis1 vaxis=axis2 hminor=0 vminor=0 overlay;
label husbinc='Husband''s Income';
label l1='P(Working)';
run;
quit;
Figure 15.5 on page 459 using the same data file womenlf as above.
data wcoded1; set womenlf; if (work='not_work') then do w=2; wstatus=1; end; else if (work='fulltime') then do w=0; wstatus=0; end; else do w=1; wstatus =0; end; if (child ='present') then kid = 1; else kid =0; if (region ='Ontario') then re1= 1; else re1 =0; if (region ='Prairie') then re2=1; else re2=0; if (region ='Atlantic') then re3=1; else re3=0; if (region = 'Quebec') then re4=1; else re4=0; ik=husbinc*kid; run; proc logistic data=wcoded1 descending noprint; model wstatus =husbinc kid ; output out=women predicted=pr; run; data women1; /*to create partial residuals*/ set women; rh = (pr-wstatus)/(pr*(1-pr)) -0.0423*husbinc; run; proc loess data=women1;/*to create data for lowess smooth of the plot*/ model rh=husbinc /iterations=2 direct smooth=0.5; /*set iterations to be 2 for outliers*/ ods output OutputStatistics=women2; run; proc sort data=women1; by husbinc; run; proc sort data=women2; by husbinc; run; data women3; merge women1 women2; by husbinc; run; title 'Figure 15.5'; symbol1 color=black i=rls v=star h=0.8 line=2; symbol2 color=blue i=jone v=none h=1 line =1; axis1 order=(0 to 50 by 10); axis2 order=(-6 to 6 by 6) label=(r=0 a=90); proc gplot data=women3; format husbinc f4.0; plot rh*husbinc=1 Pred*husbinc=2/ overlay haxis=axis1 vaxis=axis2 vminor=0 hminor=0; label husbinc='Husband''s Income ($1000s)'; label rh='Partial Residual [Working]'; run; quit;
Figure 15.6 and Figure 15.7 (a) and (b). We use SAS ODS to output the influence statistics to a data file. Index plots of Dfbeta are basically the plots presented in Figure (a) and (b).
proc logistic data=wcoded; model wstatus=husbinc kid/influence ; ods output Influence=winf; run; data winf1; set winf; g=ResDev/sqrt(1-HatDiag); /*creating studentized residaul*/ run; proc means data=winf; /*get coordinate for the reference lines*/ var HatDiag; run; axis1 order =(0 to 0.06 by 0.01); axis2 order =(-2 to 4 by 2) label=(r=0 a=90); symbol c=black i=none v=star h=0.8; title 'Figure 15.6'; proc gplot data=winf1; format HatDiag f4.2 ; plot g*HatDiag =1 / href =(.02282 .03423) lhref=2 vref=(-2 0 2) lvref=2 haxis=axis1 vaxis=axis2 hminor=0 vminor=0; label HatDiag='Hat-value'; label g='Studentized Residual'; run; quit;
axis1 order =(0 to 300 by 50); axis2 order =(-0.3 to 0.5 by 0.1) label=(r=0 a=90); symbol c=black i=none v=star h=0.8; title 'Figure 15.7 (a)'; proc gplot data=winf1; format CaseNum f4.0 husbincDfbeta f5.2; plot husbincDfbeta*CaseNum /haxis=axis1 vaxis=axis2 vref=(0) lvref=2 hminor=0 vminor=0; run; quit;
axis1 order =(0 to 300 by 50); axis2 order =(-0.1 to 0.2 by 0.05) label=(r=0 a=90); symbol c=black i=none v=star h=0.8; title 'Figure 15.7 (b)'; proc gplot data=winf; format kidDfbeta f4.2; plot kidDfbeta*CaseNum /haxis=axis1 vaxis=axis2 vref=(0) lvref=2 hminor=0 vminor=0;; run; quit;
proc logistic data=wcoded descending; /*rerunning proc logistic without observation 76 and 77*/ model wstatus=husbinc kid ; where obs ne 76 and obs ne 77 ; run; quit;
The LOGISTIC Procedure
(Some output omitted here.)
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.6090 0.4052 15.7638 <.0001
husbinc 1 0.0603 0.0212 8.1089 0.0044
kid 1 1.6476 0.2978 30.6071 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
husbinc 1.062 1.019 1.107
kid 5.194 2.898 9.312
Section 15.2 Models for Polytomous Data
Calculation on page 468 and Figure 15.8 (a) and (b) on data file womenlf. We still use the same code scheme from last section.
proc catmod data=wcoded; direct husbinc kid;/*to specify that they are continuous variables*/ model w = husbinc kid / oneway noresponse;/*Can tell baseline from oneway table*/ response out=catout; run; data catplot; set catout; array p(3); array q(3); if _type_ eq 'PROB' and kid=0 then p(_number_)=_pred_; if _type_ eq 'PROB' and kid=1 then q(_number_)=_pred_; keep p1 p2 p3 q1 q2 q3 husbinc; output; run; data lab1; length function style $ 8 text $17; retain function 'label' xsys ysys '2' when 'a' color 'black'; function='label'; x=10; y=0.78; size=1; color='black'; text='Children Present'; output; function='label'; x=35; y=0.72; size=1; color='red'; text='Not Working'; output; function='label'; x=35; y=.27; size=1; color='blue'; text='Part-Time'; output; function='label'; x=10; y=0.25; size=1; color='black'; text='Full-Time'; output; run; title 'Figure 15.8 (a)'; axis1 order=(0 to 50 by 10); axis2 order=(0 to 0.8 by .2) label =(r=0 a =90); symbol1 c=black v=none line=4 h=1 i=join; symbol2 c=blue v=none line=2 h=1 i = join; symbol3 c=red v=none line=1 h=1 i=join; proc gplot data=catplot; plot q1*husbinc=1 q2*husbinc=2 q3*husbinc=3 /annotate=lab1 overlay hminor=0 vminor=0 haxis=axis1 vaxis=axis2 cframe=snow; label husbinc='Husband''s Income ($1000s)'; label q1='Fitted Probability'; run; quit;
data lab2; length function style $ 8 text $17; retain function 'label' xsys ysys '2' when 'a' color 'black'; function='label'; x=10; y=0.95; size=1; color='black'; text='Children Absent'; output; function='label'; x=35; y=0.75; size=1; color='red'; text='Not Working'; output; function='label'; x=20; y=.18; size=1; color='blue'; text='Part-Time'; output; function='label'; x=10; y=0.60; size=1; color='black'; text='Full-Time'; output; run; title 'Figure 15.8 (b)'; axis1 order=(0 to 40 by 10); axis2 order=(0 to 1 by .25) label =(r=0 a =90); symbol1 c=black v=none line=4 h=1 i=join; symbol2 c=blue v=none line=2 h=1 i = join; symbol3 c=red v=none line=1 h=1 i=join; proc gplot data=catplot; plot p1*husbinc=1 p2*husbinc=2 p3*husbinc=3 /annotate=lab2 haxis=axis1 vaxis=axis2 overlay hminor=0 vminor=0 cframe=snow; label husbinc='Husband''s Income ($1000s)'; label p1='Fitted Probability'; run; quit;
The CATMOD Procedure
Response w Response Levels 3
Weight Variable None Populations 46
Data Set WCODED Total Frequency 263
Frequency Missing 0 Observations 263
One-Way Frequencies
Variable Value Frequency
-----------------------------
w 0 66
1 42
2 155 <---This is the value for baseline.
(More output omitted here.)
Maximum Likelihood Analysis
Sub -2 Log Convergence
Iteration Iteration Likelihood Criterion
---------------------------------------------------
0 0 577.87006 1.0000
1 0 430.05981 0.2558
2 0 423.04966 0.0163
3 0 422.88205 0.000396
4 0 422.88193 2.9496E-7
5 0 422.88193 4.418E-13
Maximum Likelihood Analysis
Parameter Estimates
Iteration 1 2 3 4 5 6
---------------------------------------------------------------------------------------
0 0 0 0 0 0 0
1 1.6451 -0.5022 -0.0616 -0.0152 -2.5035 -0.8029
2 1.8666 -1.3618 -0.0900 0.007921 -2.4978 -0.0311
3 1.9801 -1.4285 -0.0971 0.006855 -2.5570 0.0185
4 1.9828 -1.4323 -0.0972 0.006892 -2.5586 0.0215
5 1.9828 -1.4323 -0.0972 0.006892 -2.5586 0.0215
Maximum likelihood computations converged.
Maximum Likelihood Analysis of Variance
Source DF Chi-Square Pr > ChiSq
--------------------------------------------------
Intercept 2 29.35 <.0001
husbinc 2 12.82 0.0016
kid 2 53.98 <.0001
Likelihood Ratio 86 138.67 0.0003
Analysis of Maximum Likelihood Estimates
Standard Chi-
Effect Parameter Estimate Error Square Pr > ChiSq
-----------------------------------------------------------------------
Intercept 1 1.9828 0.4842 16.77 <.0001
2 -1.4323 0.5925 5.84 0.0156
husbinc 3 -0.0972 0.0281 11.98 0.0005
4 0.00689 0.0235 0.09 0.7689
kid 5 -2.5586 0.3622 49.90 <.0001
6 0.0215 0.4690 0.00 0.9635
Calculation on nested dichotomies in subsection 15.2.2. Notice that in SAS proc logistic the option rsq on the model statement gives generalized R-sqaure, which is not the same as in the book. The second proc logistic below is for the purpose of computing (G0)2 at the bottom of the page using -2*Log L.
proc logistic data=wcoded; /* [15.27]*/ model wstatus =husbinc kid/rsq; test husbinc; /*returns Wald test*/ run; proc logistic data=wcoded; model wstatus = kid /rsq; run;
The LOGISTIC Procedure ---[15.27]
(Most of the output is omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 325.733
SC 361.723 336.449
-2 Log L 356.151 319.733 <--Needed for likelihood-ratio test
R-Square 0.1293 Max-rescaled R-Square 0.1743 <--generalized R-square
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1.3358 0.3838 12.1164 0.0005
husbinc 1 -0.0423 0.0198 4.5750 0.0324
kid 1 -1.5756 0.2923 29.0650 <.0001
Linear Hypotheses Testing Results <-- Wald test on the effect of income
Wald
Label Chi-Square DF Pr > ChiSq
Test 1 4.5750 1 0.0324
The LOGISTIC Procedure --for the likelihood-ratio test on the effect on income
(Most of the output is omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 358.151 328.559
SC 361.723 335.703
-2 Log L 356.151 324.559 <--Needed for likelihood-ratio test
R-Square 0.1132 Max-rescaled R-Square 0.1526
data wcode1; /*full-time vs. part-time*/ set womenlf; if (work='fulltime') then wstatus=1; else if (work = 'parttime') then wstatus=0; if (child ='present') then kid = 1; else kid =0; run; proc logistic data=wcode1 descending;/*formula on p. 473: part-time vs. full-time*/ model wstatus= husbinc kid /rsq;
The LOGISTIC Procedure --full-time vs. part-time
(Most of the output omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 146.342 110.495
SC 149.024 118.541
-2 Log L 144.342 104.495
R-Square 0.3085 Max-rescaled R-Square 0.4185
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 39.8468 2 <.0001
Score 35.1502 2 <.0001
Wald 25.5820 2 <.0001
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 3.4778 0.7671 20.5536 <.0001
husbinc 1 -0.1073 0.0392 7.5062 0.0061
kid 1 -2.6514 0.5411 24.0134 <.0001
data wcode2; set womenlf; if (work='not_work') then w=2; else if (work='fulltime') then w=0; else w=1; if (child ='present') then kid = 1; else kid =0; ik=husbinc*kid; run; proc logistic data=wcode2; model w=husbinc kid / rsq ; run;
The LOGISTIC Procedure
(Most of the output is omitted here.)
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 504.493 449.663
SC 511.637 463.952
-2 Log L 500.493 441.663
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 58.8296 2 <.0001
Score 54.6338 2 <.0001
Wald 53.9013 2 <.0001
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.9409 0.3619 6.7579 0.0093
Intercept2 1 1.8520 0.3773 24.0994 <.0001
husbinc 1 -0.0539 0.0194 7.6913 0.0055
kid 1 -1.9719 0.2804 49.4637 <.0001
Section 15.3 Discrete Independent Variables and Contingency Tables
The analysis in this section is based on Table 15.3, which is based on data from an example from The American Voter (Campbell et al. 1960). The first dataset is created based on Table 15.3 for Figure 15.13.
data logvote;
input logv_one logv_close intensity;
label intensity ='0 for weak, 1 for medium and 2 for strong';
datalines;
.847 .9 0
.904 1.318 1
.981 2.084 2
;
run;
title 'Figure 15.13';
filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp15Fig13.gif';
axis1 label =(r=0 a=90) order=(0.8 to 2.2 by .7);
axis2 value=('Weak' 'Medium' 'Strong');
symbol1 c=black v=star i=join line=1 ;
symbol2 c=yellow v=circle i=join line=2;
legend1 label=none
shape=symbol(4,2)
cframe=white
position=(top center inside)
value=('One-sided' 'Close')
mode=share;
proc gplot data=logvote;
plot logv_one*intensity=1 logv_close*intensity=2 /
overlay vaxis=axis1 haxis=axis2 vminor=0 hminor=0 lengend=lengend1 cframe=steel;
label intensity='Intensity of Preference';
label logv_one = 'Turnout: log(voted/did not vote)';
run;
quit;
In order to do the logistic regression, we create another dataset based on Table 15.3 again using the turnout frequencies as weight. For Table 15.4 and Table 15.5 we run proc logistic for each model and obtain (G0)2 by taking the difference of -2*Log L from each pair models being compared. Table 15.5 is produced in the last data step and proc print using SAS function probchi.
data election; input perclose inten1 inten2 voted wv; label perclose ='0 for one-sided 1 for close' intens1 ='baseline: weak' voted = '0 for not 1 for yes'; clspref1=perclose*inten1; clspref2=perclose*inten2; cards; 0 0 0 1 91 0 0 0 0 39 0 1 0 1 121 0 1 0 0 49 0 0 1 1 64 0 0 1 0 24 1 0 0 1 214 1 0 0 0 87 1 1 0 1 284 1 1 0 0 76 1 0 1 1 201 1 0 1 0 25 ; run; proc logistic data=election; /*model 1*/ model voted = perclose inten1 inten2 clspref1 clspref2; weight wv; test: test clspref1=0, clspref2=0; run; proc logistic data=election; /*model 2*/ model voted = perclose inten1 inten2; weight wv; run; proc logistic data=election; /*model 3*/ model voted = perclose clspref1 clspref2; weight wv; run; proc logistic data=election; /*model 4*/ model voted = inten1 inten2 clspref1 clspref2; weight wv; run; proc logistic data=election; /*model 5*/ model voted = perclose; weight wv; run; proc logistic data=election; /*model 6*/ model voted = inten1 inten2; weight wv; run;
(Most of the output is omitted here.)
The LOGISTIC Procedure --Model 1--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1356.434
The LOGISTIC Procedure --Model 2--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1363.553
The LOGISTIC Procedure --Model 3--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1356.625
The LOGISTIC Procedure --Model 4--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1356.487
The LOGISTIC Procedure --Model 5--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1382.658
The LOGISTIC Procedure --Model 6--
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
-2 Log L 1391.266 1371.838
data logL; input models $ df g1 g2 @g p; g=g1-g2; p=1-probchi(g1-g2,df); drop g1 g2; cards; 6-2 1 1371.838 1363.552 4-1 1 1356.487 1356.434 5-2 2 1382.658 1363.552 3-1 2 1356.625 1356.434 2-1 2 1363.552 1356.434 ; run; proc print data=logL; run;
Obs models df g p 1 6-2 1 8.286 0.00400 2 4-1 1 0.053 0.81792 3 5-2 2 19.106 0.00007 4 3-1 2 0.191 0.90892 5 2-1 2 7.118 0.02847









