This FAQ page will show how a number of simple linear and nonlinear models can be coded using SAS proc nlmixed. What is meant by “simple” here is that all of the models are fixed effects only with no random effects. All of the models shown can be estimated using specific commands in SAS, for example the binary logistic model can be estimated using proc logistic or proc genmod. In fact it is much easier to run these commands using the specific procedures. The purpose of this page is allow you to be acquainted with these simpler nlmixed, to see how they work and how they’re parameterized. With that knowledge you will have a good foundation for building more complex nonlinear mixed models.
All of the models use the same dataset, hsbdemo.sas7bdat, and the same two predictor variables, read and female. Models will use different response variables depending upon the type of response variable that is appropriate. For each of the models we will give only partial output, primarily estimates, standard errors, wald tests and p-values so that you will be able to compare the results with specific commands in SAS.
A short note about starting values. You give starting values using the parms statement. Some models are very sensitive to starting values and will not converge unless given good values. Other models are very tolerant and will work properly with starting values for all parameters set to zero. There are even models that don’t require you to set the starting values, i.e., no parms statement. They get starting values automatically from the nlmixed procedure.
Ordinary least squares regression
We begin with an ordinary least squares regression predicting write from read and female. The setup for nlmixed is very straightforward with xb being the linear predictor with parameters b0 (the intercept), b1 and b2 as the regression coefficients. There is one additional parameter, s2e, which captures the residual variability and would be equivalent to the mean square error in a traditional OLS regression. By using nlmixed we will obtain an iterated maximum likelihood solution for the model.
proc nlmixed data='D:datahsbdemo.sas7bdat'; xb=b0+b1*read+b2*female; model write ~ normal(xb,s2e); run; /* partial output */ Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 20.2284 2.6933 200 7.51 <.0001 0.05 14.9174 25.5393 -1.76E-6 b1 0.5659 0.04901 200 11.55 <.0001 0.05 0.4692 0.6625 -0.00011 b2 5.4869 1.0066 200 5.45 <.0001 0.05 3.5019 7.4719 -1.04E-6 s2e 50.1128 5.0113 200 10.00 <.0001 0.05 40.2310 59.9945 1.12E-8
Thus, our model looks like this, yhat = 20.2284 + .5659*read + 5.4869*female.
Binary logistic regression
Our second model is a binary logistic regression predicting honors from read and female. The variable honors means that the student has been selected to participate in the honors English program. We will estimate this model two different ways; first, computing the expected probability and using the binary distribution as shown below.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=5 b1=0 b2=0; xb=b0+b1*read+b2*female; prob = exp(xb)/(1+exp(xb)); model honors ~ binary(prob); run; /* partial output */ Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -9.6034 1.4264 200 -6.73 <.0001 0.05 -12.4162 -6.7906 -2.71E-9 b1 0.1444 0.02333 200 6.19 <.0001 0.05 0.09835 0.1904 -1.16E-6 b2 1.1209 0.4081 200 2.75 0.0066 0.05 0.3162 1.9257 4.077E-8
The second approach is to compute a likelihood from the probability, take the log of the likelihood and then compute a general log-likelihood function using general..
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=0 b1=0 b2=0; xb=b0+b1*read+b2*female; prob = exp(xb)/(1+exp(xb)); liklhd = (prob**honors)*((1-prob)**(1-honors)); ll = log(liklhd); model honors ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -9.6034 1.4264 200 -6.73 <.0001 0.05 -12.4162 -6.7906 2.854E-7 b1 0.1444 0.02333 200 6.19 <.0001 0.05 0.09835 0.1904 0.000016 b2 1.1209 0.4081 200 2.75 0.0066 0.05 0.3162 1.9257 -8.69E-8
Probit regression
For the probit regression model we will use the same response variable, honors, as in the binary logit model above.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=0 b1=0 b2=0; xb=b0+b1*read+b2*female; prob = probnorm(xb); if honors=0 then liklhd=1-prob; else liklhd=prob; ll = log(liklhd); model honors ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -5.6720 0.7798 200 -7.27 <.0001 0.05 -7.2098 -4.1343 2.66E-6 b1 0.08560 0.01301 200 6.58 <.0001 0.05 0.05996 0.1113 0.000128 b2 0.6340 0.2301 200 2.76 0.0064 0.05 0.1803 1.0877 1.713E-6
Ordered logistic regression
We will use ses as the response variable for the ordered logistic regression. The variable ses is ordered with values 1, 2 and 3. The model given below estimates a proportional odds ordered logistic model. Some statistical software calls the parameters a1 and a2 cut points or thresholds, while other packages will parameterize the model with intercepts.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b1=0 b2=0 a1=1 a2=1; xb=b1*read+b2*female; if ses=1 then liklhd=1/(1+exp(-a1+xb)); if ses=2 then liklhd=1/(1+exp(-a2+xb))-1/(1+exp(-a1+xb)); if ses=3 then liklhd=1-1/(1+exp(-a2+xb)); ll = log(liklhd); model ses ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b1 0.05714 0.01386 200 4.12 <.0001 0.05 0.02980 0.08448 -0.00637 b2 -0.4195 0.2715 200 -1.55 0.1239 0.05 -0.9549 0.1159 -0.00004 a1 1.4774 0.7343 200 2.01 0.0456 0.05 0.02942 2.9254 0.000131 a2 3.7306 0.7791 200 4.79 <.0001 0.05 2.1942 5.2669 -4.58E-
Generalized ordered logistic regression
Generalized ordered logit estimates separate intercepts and coefficients for each equation and so therefore does not have a proportional odds assumption. In this example, we use the same variables as in the ordered logistic regression above, however, we have to estimate several more parameters then for that model.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b01=-1 b02=-2 b11=0 b21=0 b12=0 b22=0; xb1=b01+b11*read+b12*female; xb2=b02+b21*read+b22*female; if ses=1 then liklhd=1/(1+exp(xb1)); if ses=2 then liklhd=1/(1+exp(xb2))-1/(1+exp(xb1)); if ses=3 then liklhd=1-1/(1+exp(xb2)); ll = log(liklhd); model ses ~ general(ll); estimate 'b11-b12' b01-b12; estimate 'b21-b22' b21-b22; run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b01 -1.1132 0.9807 200 -1.14 0.2577 0.05 -3.0470 0.8206 0.000022 b02 -3.9737 0.9294 200 -4.28 <.0001 0.05 -5.8064 -2.1410 0.000032 b11 0.05335 0.01887 200 2.83 0.0052 0.05 0.01614 0.09057 0.000953 b21 0.05963 0.01636 200 3.64 0.0003 0.05 0.02736 0.09189 0.001598 b12 -0.7005 0.3591 200 -1.95 0.0525 0.05 -1.4085 0.007610 0.000026 b22 -0.2037 0.3232 200 -0.63 0.5292 0.05 -0.8410 0.4336 0.00001 Additional Estimates Standard Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper b11-b12 -0.4127 1.1275 200 -0.37 0.7147 0.05 -2.6361 1.8106 b21-b22 0.2633 0.3229 200 0.82 0.4158 0.05 -0.3735 0.900
Traditional statistical procedures will often organize the output by equation. The example output below shows one way the output may be displayed.
Standard Parameter Estimate Error DF t Value Pr > |t| equation 1 b01 -1.1132 0.9807 200 -1.14 0.2577 b11 0.05335 0.01887 200 2.83 0.0052 b12 -0.7005 0.3591 200 -1.95 0.0525 equation 2 b02 -3.9737 0.9294 200 -4.28 <.0001 b21 0.05963 0.01636 200 3.64 0.0003 b22 -0.2037 0.3232 200 -0.63 0.5292
Multinomial logistic regression
For the multinomial logistic regression prog (program type) is the response variable. If there are k level for the response variable there will be k-1 equations in the multinomial logistic regression model. Since there are three levels of prog there will be two equations in our model. The first equation is for prog=1 and the second equation for prog=3. Thus, prog=2 is our reference or base category. Each of the equations will have an intercept and coefficients for read and female. The two intercepts are b01 and b03 while the coefficients for read are b11 and b13 and for female the coefficients are b12 and b32.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b01=0 b03=0 b11=0 b31=0 b12=0 b32=0; xb1=b01+b11*read+b12*female; xb3=b03+b31*read+b32*female; expxb1=exp(xb1); expxb2=1; expxb3=exp(xb3); den = expxb1+expxb2+expxb3; if prog=1 then liklhd=expxb1/den; if prog=2 then liklhd=expxb2/den; if prog=3 then liklhd=expxb3/den; ll = log(liklhd); model prog ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b01 3.0193 1.0963 200 2.75 0.0064 0.05 0.8575 5.1810 3.695E-6 b03 5.3382 1.1564 200 4.62 <.0001 0.05 3.0579 7.6185 -5.42E-6 b11 -0.07121 0.02021 200 -3.52 0.0005 0.05 -0.1111 -0.03137 0.000099 b31 -0.1173 0.02244 200 -5.23 <.0001 0.05 -0.1615 -0.07304 -0.00024 b12 -0.1835 0.3727 200 -0.49 0.6230 0.05 -0.9184 0.5514 5.62E-6 b32 -0.1938 0.3801 200 -0.51 0.6106 0.05 -0.9433 0.5556 -6.56E-7
As with the generalized ordered logistic regression above you will often see the output for multinomial logistic regression from a traditional statistical procedure organized by groups similar to what is shown below.
Standard Parameter Estimate Error DF t Value Pr > |t| prog=1 b01 3.0193 1.0963 200 2.75 0.0064 b11 -0.07121 0.02021 200 -3.52 0.0005 b12 -0.1835 0.3727 200 -0.49 0.6230 prog=3 b03 5.3382 1.1564 200 4.62 <.0001 b31 -0.1173 0.02244 200 -5.23 <.0001 b32 -0.1938 0.3801 200 -0.51 0.6106 prog=2 is the base category
Poisson regression
Our next example is a count model using a poisson distribution. The response variable for this model is awards which is a count of the number of awards received by a student during high school. As with the binary logistic model above there are two ways you can parameterize this model. The first method will use the poisson distribution option.
The poisson regression model uses awards
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=0 b1=0 b2=0; xb=b0+b1*read+b2*female; mu = exp(xb); model awards ~ poisson(mu); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -3.0922 0.3325 200 -9.30 <.0001 0.05 -3.7479 -2.4364 -0.00012 b1 0.06009 0.005404 200 11.12 <.0001 0.05 0.04944 0.07075 -0.01074 b2 0.4690 0.1142 200 4.11 <.0001 0.05 0.2438 0.6941 -0.00015
The second method for parameterizing this model is to compute the log-likelihood and use the general log-likelihood function. The two approached yield the same results.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=0 b1=0 b2=0; xb=b0+b1*read+b2*female; mu = exp(xb); ll = awards*log(mu) - mu - lgamma(awards+1); model awards ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -3.0922 0.3325 200 -9.30 <.0001 0.05 -3.7479 -2.4364 -0.00012 b1 0.06009 0.005404 200 11.12 <.0001 0.05 0.04944 0.07075 -0.01074 b2 0.4690 0.1142 200 4.11 <.0001 0.05 0.2438 0.6941 -0.00015
Negative binomial regression
Our final example is a negative binomial regression. We will use the same response variable, awards, as was used in the poisson example. One way of conceptualizing a negative binomial model is to think of it as a poisson model with overdispersion, that is, excess variance. In a true poisson model the mean and the variance are equal. However, variables distributed as a negative binomial have a variance that is greater than the mean. The parameterization if this model is similar to that of the poisson with the addition of an addition parameter, alpha, which measures the degree of over dispersion.
proc nlmixed data='D:datahsbdemo.sas7bdat'; parms b0=0 b1=0 b2=0 alpha=.1; xb=b0+b1*read+b2*female; mu = exp(xb); m = 1/alpha; ll = lgamma(awards+m)-lgamma(awards+1)-lgamma(m) + awards*log(alpha*mu)-(awards+m)*log(1+alpha*mu); model awards ~ general(ll); run; [partial output] Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -3.3266 0.4111 200 -8.09 <.0001 0.05 -4.1372 -2.5160 0.000113 b1 0.06392 0.006809 200 9.39 <.0001 0.05 0.05050 0.07735 0.010318 b2 0.5065 0.1354 200 3.74 0.0002 0.05 0.2396 0.7735 0.000062 alpha 0.1828 0.08455 200 2.16 0.0318 0.05 0.01606 0.3495 -0.00016