Table 7.2 and Figure 7.3 on page 119.
data table7_2; input dose number killed; cards; 1.6907 59 6 1.7242 60 13 1.7552 62 18 1.7842 56 28 1.8113 63 52 1.8369 59 53 1.861 62 61 1.8839 60 60 ; run; data table7_2a; set table7_2; proportion = killed/number; one = 1; run; goptions reset = all; axis1 order = (1.6 to 1.9 by .1) minor = none; axis2 order = (0 to 1 by .2) minor = none label=('Proportion killed'); symbol v = dot h = 1; proc gplot data = table7_2a; plot proportion*dose /haxis=axis1 vaxis=axis2; run; quit;
Table 7.3 on page 121 based on the algorithm described in the text.
proc iml; use table7_2a; read all; x = one || dose; b0 = j(2, 1, 0); y = killed; nobs = nrow(y); n = number; fitted = j(nobs, 6, 0); /*for fitted values*/ betas = j(3, 6); /*for parameter estimates*/ j = i(2); /*intermediate information matrix*/ u = j(2, 1, 0); /*intermediate vector u*/ count = 1; do while(count<=6); pi = exp(x*b0)/(1+exp(x*b0)); ll = t(y)*(x*b0) - t(n)*log(one + exp(x*b0)) ; j[1,1] = sum(n#pi#(1-pi)); j[1,2] = sum(n#dose#pi#(1-pi)); j[2,1] = j[1,2]; j[2,2] = sum(n#dose#dose#pi#(1-pi)); u[1,] = sum(y - n#pi); u[2,] = sum(dose#(y-n#pi)); b = b0 + inv(j)*u; betas[1, count] = b0[1]; betas[2, count] = b0[2]; betas[3, count] = ll; b0 = b; fitted[, count] = n#pi; count = count + 1; end; rname = {'beta1' 'beta2' 'll'}; cname = {'initial' 'first' 'second' 'third' 'fourth' 'fifth'}; print 'Table 7.3 Fitting a linear logistic model'; print '-----------------------------------------------------------'; print betas[rowname=rname colname=cname label='' format = 8.4] ; print '-----------------------------------------------------------'; print 'Observations Fitted values'; rn = {y1 y2 y3 y4 y5 y6 y7 y8}; print y[rowname=rn label=''] fitted[label='' format=10.3]; jinv = inv(j); D = 2*sum(y#log(y/fitted[,6])+(y=0)) + 2*sum((n-y)#log((n-y)/(n-fitted[,6]) + (n-y=0))) ; print jinv[format=6.3 label='V(b)'] D[format=5.3 label='Deviance']; quit;
Table 7.3 Fitting a linear logistic model ----------------------------------------------------------- initial first second third fourth fifth beta1 0.0000 -37.8564 -53.8532 -59.9652 -60.7078 -60.7175 beta2 0.0000 21.3374 30.3835 33.8442 34.2648 34.2703 ll -333.404 -200.010 -187.274 -186.247 -186.235 -186.235 ----------------------------------------------------------- Observations Fitted values Y1 6 29.500 8.505 4.543 3.562 3.459 3.457 Y2 13 30.000 15.366 11.254 9.987 9.844 9.842 Y3 18 31.000 24.808 23.058 22.513 22.452 22.451 Y4 28 28.000 30.983 32.947 33.790 33.896 33.898 Y5 52 31.500 43.362 48.197 49.893 50.093 50.096 Y6 53 29.500 46.741 51.705 53.132 53.289 53.291 Y7 61 31.000 53.595 58.061 59.112 59.221 59.222 Y8 60 30.000 54.734 58.036 58.679 58.742 58.743 V(b) Deviance 26.840 -15.08 11.23 -15.08 8.481
Table 7.4 on page 122. SAS proc probit allows different distributions, such as probit and extreme value distribution. We use both proc logistic and proc probit for the three models in the table. Then we output the predicted probabilities from each model. In a proc sql step, we created expected count for each observation based on each of the three models. In the proc logistic, we used the option scale = none aggregate to the deviance and in proc probit we used the option lackfit to get the deviance.
proc logistic data = table7_2; model killed/number = dose /scale=none aggregate; output out = logit p = p_logit; run;
Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 6 11.2322 1.8720 0.0815 Pearson 6 10.0253 1.6709 0.1236
proc probit data = table7_2 ; model killed/number = dose /lackfit; output out = probit p=p_probit; run;
Goodness-of-Fit Tests Statistic Value DF Pr > ChiSq Pearson Chi-Square 9.5134 6 0.1467 L.R. Chi-Square 10.1198 6 0.1197
proc probit data = table7_2 ; model killed/number = dose /d = extreme lackfit; output out = extreme p = p_extreme; run;
Goodness-of-Fit Tests Statistic Value DF Pr > ChiSq Pearson Chi-Square 3.2947 6 0.7711 L.R. Chi-Square 3.4464 6 0.7511
proc sql; create table table7_4 as select logit.killed as killed, logit.number as number, logit.dose as dose, logit.p_logit*logit.number as n_logit, probit.p_probit*probit.number as n_probit, extreme.p_extreme*extreme.number as n_extreme from logit, probit, extreme where logit.killed = probit.killed and logit.number = probit.number and probit.killed = extreme.killed and probit.number = extreme.number; quit; proc print data = table7_4; var killed number dose n_logit n_probit n_extreme; run;
Obs killed number dose n_logit n_probit n_extreme 1 6 59 1.6907 3.4583 3.3578 5.5894 2 13 60 1.7242 9.8428 10.7216 11.2807 3 18 62 1.7552 22.4518 23.4819 20.9542 4 28 56 1.7842 33.8967 33.8155 30.3694 5 52 63 1.8113 50.0942 49.6156 47.7764 6 53 59 1.8369 53.2896 53.3189 54.1427 7 61 62 1.8610 59.2213 59.6647 61.1133 8 60 60 1.8839 58.7425 59.2280 59.9472
Table 7.5 on page 123.
data table7_5; input y n storage centrifuge; cards; 55 102 1 40 52 99 1 150 57 108 1 350 55 76 2 40 50 81 2 150 50 90 2 350 ; run;
Figure 7.4 on page 124.
data table7_5a; set table7_5; proportion = y/n; logc = log(centrifuge); run; goptions reset =all; axis1 order = (.5 to .8 by .1) minor = none label=('proportion'); axis2 label = ('Log(centrifuging force)') minor = none; symbol1 v = P font=marker c = blue ; symbol2 v = dot c=black; proc gplot data = table7_5a; plot proportion*logc = storage /vaxis = axis1 haxis = axis2; run; quit;
Table 7.6 and 7.7 on page 125.
Model 1:
proc logistic data = table7_5a; class storage(ref='1') /param = ref; model y/n = logc storage logc*storage /scale=none aggregate; output out = m1 p = pm1; run;
Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 2 0.0277 0.0139 0.9862 Pearson 2 0.0277 0.0139 0.9862 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 0.2339 0.6284 0.1385 0.7097 logc 1 -0.0227 0.1268 0.0321 0.8577 storage 2 1 1.9759 0.9980 3.9201 0.0477 logc*storage 2 1 -0.3184 0.1989 2.5635 0.1094
Model 2:
proc logistic data = table7_5a; class storage(ref='1') /param = ref; model y/n = logc storage /scale=none aggregate; output out = m2 p = pm2; run;
Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 3 2.6188 0.8729 0.4542 Pearson 3 2.5980 0.8660 0.4578 Analysis of Maximum Likelihood Estimates< Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 0.8767 0.4870 3.2406 0.0718 logc 1 -0.1546 0.0970 2.5386 0.1111 storage 2 1 0.4068 0.1746 5.4278 0.0198
Model 3:
proc logistic data = table7_5a; model y/n = logc /scale=none aggregate; output out = m3 p = pm3; run;
Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 1 0.0099 0.0099 0.9207 Pearson 1 0.0099 0.0099 0.9206 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1.0213 0.4813 4.5025 0.0338 logc 1 -0.1478 0.0965 2.3474 0.1255
proc sql; create table table7_7 as select m1.*, m1.pm1*m1.n as n_m1, m2.pm2*m2.n as n_m2, m3.pm3*m3.n as n_m3 from m1, m2, m3 where m1.y = m2.y and m1.n = m2.n and m1.storage = m2.storage and m1.logc = m2.logc and m1.y = m3.y and m1.n = m3.n and m1.logc = m3.logc; quit; proc print data = table7_7; var storage y n_m1 n_m2 n_m3; run;
Obs storage y n_m1 n_m2 n_m3 1 1 55 54.8182 58.7539 62.9125 2 1 52 52.4654 52.0253 56.3979 3 1 57 56.7164 53.2208 58.1836 4 2 55 54.8257 51.0056 46.8759 5 2 50 50.4279 50.5895 46.1437 6 2 50 49.7389 53.4045 48.4863
Table 7.8 and logistic regression result on page 129.
data table7_8; input x s @@; datalines; 9 1 13 1 6 1 8 1 10 1 4 1 14 1 8 1 11 1 7 1 9 1 7 1 5 1 14 1 13 0 16 0 10 0 12 0 11 0 14 0 15 0 18 0 7 0 16 0 9 0 9 0 11 0 13 0 15 0 13 0 10 0 11 0 6 0 17 0 14 0 19 0 9 0 11 0 14 0 10 0 16 0 10 0 16 0 14 0 13 0 13 0 9 0 15 0 10 0 11 0 12 0 4 0 14 0 20 0 ; run; proc logistic data = table7_8; model s(event='1') = x /scale = none aggregate rsq; run;
Deviance and Pearson Goodness-of-Fit Statistics Criterion DF Value Value/DF Pr > ChiSq Deviance 15 9.4190 0.6279 0.8546 Pearson 15 8.0830 0.5389 0.9204 Number of unique profiles: 17 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 63.806 55.017 SC 65.795 58.995 -2 Log L 61.806 51.017 R-Square 0.1811 Max-rescaled R-Square 0.2657 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 10.7889 1 0.0010 Score 9.7954 1 0.0017 Wald 8.0570 1 0.0045 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 2.4040 1.1918 4.0687 0.0437 x 1 -0.3235 0.1140 8.0570 0.0045
Figure 7.5 on page 130. We first created the grouped data set based on the covariate patterns of variable x. Then we refit the data to the logistic model and created predicted probabilities.
proc sql; create table grp7_8 as select distinct x as x, sum(s) as y, count(s) as total, sum(s)/count(s) as proportion from table7_8 group by x; quit; proc logistic data = grp7_8 noprint; model y/total = x /scale = none aggregate rsq; output out = m p=p; run; goptions reset = all; symbol1 v = dot c=black; symbol2 v = P font=marker c=blue; axis1 order = (0 to 1 by .5) minor = none label=('Proportion'); axis2 order = (0 to 20 by 5) minor = none label=('WAIS score'); proc gplot data = m; plot p*x proportion*x /overlay vaxis = axis1 haxis=axis2; run; quit;
Table 7.9 on page 130.
proc logistic data = grp7_8 noprint; model y/total = x ; output out = m1 p=p reschi = reschi resdev = d ; run; proc print data = m1; var x y total p reschi d; run;
Obs x y total p reschi d 1 4 1 2 0.75211 -0.82574 -0.76598 2 5 1 1 0.68706 0.67490 0.86642 3 6 1 2 0.61369 -0.33022 -0.32585 4 7 2 3 0.53478 0.45799 0.46370 5 8 2 2 0.45408 1.55065 1.77706 6 9 2 6 0.37573 -0.21441 -0.21619 7 10 1 6 0.30338 -0.72844 -0.77068 8 11 1 6 0.23962 -0.41862 -0.43591 9 12 0 2 0.18568 -0.67531 -0.90643 10 13 1 6 0.14163 0.17592 0.17190 11 14 2 7 0.10665 1.53479 1.30564 12 15 0 3 0.07952 -0.50908 -0.70509 13 16 0 4 0.05883 -0.50004 -0.69647 14 17 0 1 0.04327 -0.21268 -0.29745 15 18 0 1 0.03169 -0.18091 -0.25379 16 19 0 1 0.02313 -0.15389 -0.21636 17 20 0 1 0.01685 -0.13091 -0.18434