How can I get “adjusted” predicted values from a logistic model in SAS?

After running a logistic model with multiple predictors or an interaction, you may wish to be able to see predicted values with confidence intervals for different combinations of predictors. This can easily be done with the adjust command in Stata (see Stata FAQ: How do I use adjust in probit or logit?). This page will demonstrate how to achieve this in SAS by combining the outmodel and inmodel options in proc logistic with a few data steps. We will be using the hsb2.sas7bdat dataset.

data hsb2; set indata.hsb2; 
  honcomp = (write >=60);
run;

Example 1: Main effect with a continuous covariate

First let’s consider a logistic regression with two continuous covariates. In the regression output below, we see that both read and math are significant. We have also included outmodel in our regression so that we can save the regression parameters and apply them to another dataset. Let us focus on visualizing the effect of read when math is held at its mean.

proc logistic data = hsb2 outmodel=pout;
	model honcomp = read math;
run;

...< some output omitted >... 

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1     11.9251      1.7168       48.2483        <.0001
read          1     -0.0709      0.0259        7.4891        0.0062
math          1     -0.1266      0.0305       17.1745        <.0001

We can use proc sql to generate a macro variable that is equal to the mean of math and then use a do-loop over a range of possible read values to create a dataset toscore that contains combinations of predictor variable values for which we are interested in predicted probabilities.

proc sql;
  select mean(math) into :mmath
  from hsb2;
quit;

data toscore;
  do read = 20 to 70;
   math = &mmath;
   output;
  end;
run;

proc print data = toscore (obs = 10); 
run;

Obs    read     math
  1     20     52.645
  2     21     52.645
  3     22     52.645
  4     23     52.645
  5     24     52.645
  6     25     52.645
  7     26     52.645
  8     27     52.645
  9     28     52.645
 10     29     52.645

We can again run proc logistic, this time referring back to the regression parameters we saved in the first run with the inmodel option. We indicate the dataset to which the model should be applied (toscore) and the values (predicted probability with score, confidence intervals with clm) we wish to include in the generated dataset (pred). Lastly, we graph the predicted probabilities and the confidence intervals that we generate.

proc logistic inmodel=pout;
  score clm data = toscore out=pred ;
run;

proc print data = pred (obs = 10);
run;

Obs read  math  I_honcomp   P_0      P_1    LCL_0   UCL_0    LCL_1   UCL_1
  1  20  52.645     0     0.97905 0.020954 0.88499 0.99649 0.003512 0.11501
  2  21  52.645     0     0.97754 0.022459 0.88275 0.99604 0.003958 0.11725
  3  22  52.645     0     0.97593 0.024069 0.88047 0.99554 0.004460 0.11953
  4  23  52.645     0     0.97421 0.025792 0.87813 0.99497 0.005025 0.12187
  5  24  52.645     0     0.97236 0.027635 0.87574 0.99434 0.005660 0.12426
  6  25  52.645     0     0.97039 0.029606 0.87329 0.99363 0.006374 0.12671
  7  26  52.645     0     0.96829 0.031713 0.87079 0.99282 0.007177 0.12921
  8  27  52.645     0     0.96604 0.033964 0.86823 0.99192 0.008078 0.13177
  9  28  52.645     0     0.96363 0.036369 0.86560 0.99091 0.009091 0.13440
 10  29  52.645     0     0.96106 0.038937 0.86291 0.98977 0.010226 0.13709

proc sgplot data=pred;
  title "Predicted probability of honcomp on read, math at mean";
  series x=read y=P_1;
  series x=read y=LCL_1;
  series x=read y=UCL_1;
  keylegend / across = 1 location=inside position=topleft;
run;
quit;

Example 2: Categorical-by-continuous interaction with a continuous covariate

Next we consider a logistic regression where we have a significant categorical-by-continuous interaction that we wish to better understand. Our outcome is again honcomp and our model suggests that the relationship between socst and honcomp differs by level of female. We can hold our other covariate read at its mean and look at the predicted probability of honcomp on socst by female.

The overall process is the same as in Example 1. We are simply adding another do-loop when we generate toscore2, our dataset of predictor value combinations, so that we cycle through values of socst for both female = 0 and female = 1. In the plot, we leave out confidence intervals for a clearer picture (though they could easily be added).

proc logistic data = hsb2 outmodel=pout2;
	model honcomp = female|socst read;
run;
...< some output omitted >... 

The LOGISTIC Procedure

               Analysis of Maximum Likelihood Estimates
                                  Standard          Wald
Parameter       DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept        1     14.7664      2.9554       24.9648        <.0001
female           1     -7.1249      3.1138        5.2357        0.0221
socst            1     -0.1084      0.0470        5.3286        0.0210
female*socst     1      0.1050      0.0526        3.9861        0.0459
read             1     -0.1255      0.0277       20.4930        <.0001


proc sql;
  select mean(read) into :mread
  from hsb2;
quit;

data toscore2;
  do female = 0 to 1;
    do socst = 20 to 70;
     read = &mread;
     output;
    end;
  end;
run;

proc logistic inmodel=pout2;
  score clm data = toscore2 out=pred2 ;
run;

proc sgplot data=pred2;
  title "Predicted probability of honcomp on socst by female, read at mean";
  series x=socst y=P_1 / group = female; 
run;
quit;

Example 3: Continuous by continuous interaction with a continuous covariate

Lastly we consider a logistic regression where we have a significant continuous-by-continuous interaction and an additional covariate. Our outcome is again honcomp and our continuous variables math and socst interact significantly. We choose one of these variables (socst) to be represented continuously and we can look at its effect on the predicted probabilities of honcomp for high, mid, and low values of the other interaction variable (math). The remaining covariate (science) is held at its mean.

The overall process is the same as in Examples 1 and 2. In our toscore3 dataset, we enter our three levels of math and create an additional variable mathcat that is primarily for graphing purposes.

proc logistic data = hsb2 outmodel=pout3;
	model honcomp = math|socst science;
run;

...< some output omitted >... 

The LOGISTIC Procedure

              Analysis of Maximum Likelihood Estimates
                                Standard          Wald
Parameter     DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept      1     -3.1784      7.6480        0.1727        0.6777
math           1      0.1675      0.1449        1.3365        0.2476
socst          1      0.2523      0.1451        3.0221        0.0821
math*socst     1    -0.00539     0.00266        4.0948        0.0430
science        1     -0.0443      0.0287        2.3780        0.1231



proc sql;
  select stderr(math) into :sdmath
  from hsb2;
  select mean(science) into :mscience
  from hsb2;
quit;

data toscore3;
  do socst = 20 to 70;
    science = &mscience;
    * low math;
    math = &mmath - &sdmath;
	mathcat = "low ";
    output;
    * mean math;
    math = &mmath;
	mathcat = "mid ";
    output;
    * high math;
    math = &mmath + &sdmath;
	mathcat = "high";
    output;
  end;
run;

proc logistic inmodel=pout3;
  score clm data = toscore3 out=pred3;
run;

proc sgplot data=pred3;
  title "Predicted probability of honcomp on socst at 3 levels of math, science at mean";
  series x=socst y=P_1 / group = mathcat; 
run;
quit;