How can I run simple linear and nonlinear models using nlmixed?

This FAQ page will show how a number of simple linear and nonlinear models can be coded using SAS proc nlmixed. What is meant by “simple” here is that all of the models are fixed effects only with no random effects. All of the models shown can be estimated using specific commands in SAS, for example the binary logistic model can be estimated using proc logistic or proc genmod. In fact it is much easier to run these commands using the specific procedures. The purpose of this page is allow you to be acquainted with these simpler nlmixed, to see how they work and how they’re parameterized. With that knowledge you will have a good foundation for building more complex nonlinear mixed models.

All of the models use the same dataset, hsbdemo.sas7bdat, and the same two predictor variables, read and female. Models will use different response variables depending upon the type of response variable that is appropriate. For each of the models we will give only partial output, primarily estimates, standard errors, wald tests and p-values so that you will be able to compare the results with specific commands in SAS.

A short note about starting values. You give starting values using the parms statement. Some models are very sensitive to starting values and will not converge unless given good values. Other models are very tolerant and will work properly with starting values for all parameters set to zero. There are even models that don’t require you to set the starting values, i.e., no parms statement. They get starting values automatically from the nlmixed procedure.

Ordinary least squares regression

We begin with an ordinary least squares regression predicting write from read and female. The setup for nlmixed is very straightforward with xb being the linear predictor with parameters b0 (the intercept), b1 and b2 as the regression coefficients. There is one additional parameter, s2e, which captures the residual variability and would be equivalent to the mean square error in a traditional OLS regression. By using nlmixed we will obtain an iterated maximum likelihood solution for the model.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  xb=b0+b1*read+b2*female;
  model write ~ normal(xb,s2e);
run;

/* partial output */
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient
b0          20.2284    2.6933   200     7.51    <.0001    0.05   14.9174   25.5393  -1.76E-6
b1           0.5659   0.04901   200    11.55    <.0001    0.05    0.4692    0.6625  -0.00011
b2           5.4869    1.0066   200     5.45    <.0001    0.05    3.5019    7.4719  -1.04E-6
s2e         50.1128    5.0113   200    10.00    <.0001    0.05   40.2310   59.9945   1.12E-8

Thus, our model looks like this, yhat = 20.2284 + .5659*read + 5.4869*female.

Binary logistic regression

Our second model is a binary logistic regression predicting honors from read and female. The variable honors means that the student has been selected to participate in the honors English program. We will estimate this model two different ways; first, computing the expected probability and using the binary distribution as shown below.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=5 b1=0 b2=0;
  xb=b0+b1*read+b2*female;
  prob = exp(xb)/(1+exp(xb));
  model honors ~ binary(prob);
run;

/* partial output */
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient
b0          -9.6034    1.4264   200    -6.73    <.0001    0.05  -12.4162   -6.7906  -2.71E-9
b1           0.1444   0.02333   200     6.19    <.0001    0.05   0.09835    0.1904  -1.16E-6
b2           1.1209    0.4081   200     2.75    0.0066    0.05    0.3162    1.9257  4.077E-8

The second approach is to compute a likelihood from the probability, take the log of the likelihood and then compute a general log-likelihood function using general..

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=0 b1=0 b2=0;
  xb=b0+b1*read+b2*female;
  prob = exp(xb)/(1+exp(xb));
  liklhd = (prob**honors)*((1-prob)**(1-honors));
  ll = log(liklhd);
  model honors ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient
b0          -9.6034    1.4264   200    -6.73    <.0001    0.05  -12.4162   -6.7906  2.854E-7
b1           0.1444   0.02333   200     6.19    <.0001    0.05   0.09835    0.1904  0.000016
b2           1.1209    0.4081   200     2.75    0.0066    0.05    0.3162    1.9257  -8.69E-8

Probit regression

For the probit regression model we will use the same response variable, honors, as in the binary logit model above.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=0 b1=0 b2=0;
  xb=b0+b1*read+b2*female;
  prob = probnorm(xb);
  if honors=0 then liklhd=1-prob;
     else liklhd=prob;
  ll = log(liklhd);
  model honors ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b0          -5.6720    0.7798   200    -7.27    <.0001    0.05  -7.2098   -4.1343   2.66E-6
b1          0.08560   0.01301   200     6.58    <.0001    0.05  0.05996    0.1113  0.000128
b2           0.6340    0.2301   200     2.76    0.0064    0.05   0.1803    1.0877  1.713E-6

Ordered logistic regression

We will use ses as the response variable for the ordered logistic regression. The variable ses is ordered with values 1, 2 and 3. The model given below estimates a proportional odds ordered logistic model. Some statistical software calls the parameters a1 and a2 cut points or thresholds, while other packages will parameterize the model with intercepts.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b1=0 b2=0 a1=1 a2=1;
  xb=b1*read+b2*female;
  if ses=1 then liklhd=1/(1+exp(-a1+xb));
  if ses=2 then liklhd=1/(1+exp(-a2+xb))-1/(1+exp(-a1+xb));
  if ses=3 then liklhd=1-1/(1+exp(-a2+xb));
  ll = log(liklhd);
  model ses ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b1          0.05714   0.01386   200     4.12    <.0001    0.05  0.02980   0.08448  -0.00637
b2          -0.4195    0.2715   200    -1.55    0.1239    0.05  -0.9549    0.1159  -0.00004
a1           1.4774    0.7343   200     2.01    0.0456    0.05  0.02942    2.9254  0.000131
a2           3.7306    0.7791   200     4.79    <.0001    0.05   2.1942    5.2669  -4.58E-

Generalized ordered logistic regression

Generalized ordered logit estimates separate intercepts and coefficients for each equation and so therefore does not have a proportional odds assumption. In this example, we use the same variables as in the ordered logistic regression above, however, we have to estimate several more parameters then for that model.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b01=-1 b02=-2 b11=0 b21=0 b12=0 b22=0;
  xb1=b01+b11*read+b12*female;
  xb2=b02+b21*read+b22*female;
  if ses=1 then liklhd=1/(1+exp(xb1));
  if ses=2 then liklhd=1/(1+exp(xb2))-1/(1+exp(xb1));
  if ses=3 then liklhd=1-1/(1+exp(xb2));
  ll = log(liklhd);
  model ses ~ general(ll);
  estimate 'b11-b12' b01-b12;
  estimate 'b21-b22' b21-b22;
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b01         -1.1132    0.9807   200    -1.14    0.2577    0.05  -3.0470    0.8206  0.000022
b02         -3.9737    0.9294   200    -4.28    <.0001    0.05  -5.8064   -2.1410  0.000032
b11         0.05335   0.01887   200     2.83    0.0052    0.05  0.01614   0.09057  0.000953
b21         0.05963   0.01636   200     3.64    0.0003    0.05  0.02736   0.09189  0.001598
b12         -0.7005    0.3591   200    -1.95    0.0525    0.05  -1.4085  0.007610  0.000026
b22         -0.2037    0.3232   200    -0.63    0.5292    0.05  -0.8410    0.4336  0.00001

Additional Estimates
                       Standard
Label      Estimate       Error      DF    t Value    Pr > |t|   Alpha       Lower       Upper
b11-b12     -0.4127      1.1275     200      -0.37      0.7147    0.05     -2.6361      1.8106
b21-b22      0.2633      0.3229     200       0.82      0.4158    0.05     -0.3735      0.900

Traditional statistical procedures will often organize the output by equation. The example output below shows one way the output may be displayed.

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   
equation 1
b01         -1.1132    0.9807   200    -1.14    0.2577   
b11         0.05335   0.01887   200     2.83    0.0052    
b12         -0.7005    0.3591   200    -1.95    0.0525    
equation 2
b02         -3.9737    0.9294   200    -4.28    <.0001   
b21         0.05963   0.01636   200     3.64    0.0003    
b22         -0.2037    0.3232   200    -0.63    0.5292

Multinomial logistic regression

For the multinomial logistic regression prog (program type) is the response variable. If there are k level for the response variable there will be k-1 equations in the multinomial logistic regression model. Since there are three levels of prog there will be two equations in our model. The first equation is for prog=1 and the second equation for prog=3. Thus, prog=2 is our reference or base category. Each of the equations will have an intercept and coefficients for read and female. The two intercepts are b01 and b03 while the coefficients for read are b11 and b13 and for female the coefficients are b12 and b32.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b01=0 b03=0 b11=0 b31=0 b12=0 b32=0;
  xb1=b01+b11*read+b12*female;
  xb3=b03+b31*read+b32*female;
  expxb1=exp(xb1);
  expxb2=1;
  expxb3=exp(xb3);
  den = expxb1+expxb2+expxb3;
  if prog=1 then liklhd=expxb1/den;
  if prog=2 then liklhd=expxb2/den;
  if prog=3 then liklhd=expxb3/den;
  ll = log(liklhd);
  model prog ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b01          3.0193    1.0963   200     2.75    0.0064    0.05   0.8575    5.1810  3.695E-6
b03          5.3382    1.1564   200     4.62    <.0001    0.05   3.0579    7.6185  -5.42E-6
b11        -0.07121   0.02021   200    -3.52    0.0005    0.05  -0.1111  -0.03137  0.000099
b31         -0.1173   0.02244   200    -5.23    <.0001    0.05  -0.1615  -0.07304  -0.00024
b12         -0.1835    0.3727   200    -0.49    0.6230    0.05  -0.9184    0.5514   5.62E-6
b32         -0.1938    0.3801   200    -0.51    0.6106    0.05  -0.9433    0.5556  -6.56E-7

As with the generalized ordered logistic regression above you will often see the output for multinomial logistic regression from a traditional statistical procedure organized by groups similar to what is shown below.

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t| 
prog=1  
b01          3.0193    1.0963   200     2.75    0.0064   
b11        -0.07121   0.02021   200    -3.52    0.0005 
b12         -0.1835    0.3727   200    -0.49    0.6230 
prog=3   
b03          5.3382    1.1564   200     4.62    <.0001 
b31         -0.1173   0.02244   200    -5.23    <.0001    
b32         -0.1938    0.3801   200    -0.51    0.6106  
prog=2 is the base category

Poisson regression

Our next example is a count model using a poisson distribution. The response variable for this model is awards which is a count of the number of awards received by a student during high school. As with the binary logistic model above there are two ways you can parameterize this model. The first method will use the poisson distribution option.

The poisson regression model uses awards

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=0 b1=0 b2=0;
  xb=b0+b1*read+b2*female;
  mu = exp(xb);
  model awards ~ poisson(mu);
run;

[partial output]
Parameter Estimates

                    Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b0          -3.0922    0.3325   200    -9.30    <.0001    0.05  -3.7479   -2.4364  -0.00012
b1          0.06009  0.005404   200    11.12    <.0001    0.05  0.04944   0.07075  -0.01074
b2           0.4690    0.1142   200     4.11    <.0001    0.05   0.2438    0.6941  -0.00015

The second method for parameterizing this model is to compute the log-likelihood and use the general log-likelihood function. The two approached yield the same results.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=0 b1=0 b2=0;
  xb=b0+b1*read+b2*female;
  mu = exp(xb);
  ll = awards*log(mu) - mu - lgamma(awards+1);
  model awards ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b0          -3.0922    0.3325   200    -9.30    <.0001    0.05  -3.7479   -2.4364  -0.00012
b1          0.06009  0.005404   200    11.12    <.0001    0.05  0.04944   0.07075  -0.01074
b2           0.4690    0.1142   200     4.11    <.0001    0.05   0.2438    0.6941  -0.00015

Negative binomial regression

Our final example is a negative binomial regression. We will use the same response variable, awards, as was used in the poisson example. One way of conceptualizing a negative binomial model is to think of it as a poisson model with overdispersion, that is, excess variance. In a true poisson model the mean and the variance are equal. However, variables distributed as a negative binomial have a variance that is greater than the mean. The parameterization if this model is similar to that of the poisson with the addition of an addition parameter, alpha, which measures the degree of over dispersion.

proc nlmixed data='D:datahsbdemo.sas7bdat';
  parms b0=0 b1=0 b2=0  alpha=.1;
  xb=b0+b1*read+b2*female;
  mu = exp(xb);
  m  = 1/alpha;
  ll = lgamma(awards+m)-lgamma(awards+1)-lgamma(m)
       + awards*log(alpha*mu)-(awards+m)*log(1+alpha*mu);
  model awards ~ general(ll);
run;

[partial output]
Parameter Estimates

                     Standard
Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha    Lower     Upper  Gradient
b0          -3.3266    0.4111   200    -8.09    <.0001    0.05  -4.1372   -2.5160  0.000113
b1          0.06392  0.006809   200     9.39    <.0001    0.05  0.05050   0.07735  0.010318
b2           0.5065    0.1354   200     3.74    0.0002    0.05   0.2396    0.7735  0.000062
alpha        0.1828   0.08455   200     2.16    0.0318    0.05  0.01606    0.3495  -0.00016