How do I run a random effect zero-inflated poisson model using nlmixed?

For this example, we are using the Hospital, Doctor, Patient dataset. The SAS program containing all the code for this page may be downloaded here.

Poisson model

Let’s start by just considering a poisson regression model. Unlike the familiar Gaussian distribution which has two parameters $\mathcal{N}(\mu, \sigma^{2})$, the Poisson distribution is described by a single parameter, $lambda$ that is both the mean and variance. That is, $\lambda = E(x)$ and $\lambda = Var(x) = E(x^{2}) – E(x)^{2}$. This can be a strong assumption. The standard Poisson model is (see section 4.3 in Categorical Data Analysis, 2nd ed. by Alan Agresti): \begin{equation} ln[E(y_{i})] = ln(\mu_{i}) = \alpha + \beta x_{i} \end{equation} The log link is the canonical link function for the Poisson distribution, and the expected value of the response is modeled. Because the poisson distribution only has one parameter, there is no residual variance (although there are residuals). We can put the model back in the original units as: \begin{equation} \lambda_{i} = \mu_{i} = e^{\alpha + \beta x_{i}} \end{equation} and we assume the response is distributed as Poisson \begin{equation} y_{i} \sim Pois(\lambda_{i}) \end{equation} The likelihood function for the Poisson model is given by: \begin{equation} \mathcal{l} = e^{y_{i}ln(\mu_{i}) – \mu_{i} – ln(y_{i}!)} \end{equation} so the log likelihood is simply: \begin{equation} \mathcal{ll} = y_{i}ln(\mu_{i}) – \mu_{i} – ln(y_{i}!) \end{equation} finally note that the natural logarithm of a n factorial is equivalent the the natural logarithm of the gamma function n plus 1, that is: \begin{equation} ln(y_{i}@!) = lgamma(y_{i} + 1) \end{equation} Now let’s look at how we can run a Poisson model in SAS. Because proc nlmixed can be twitchy about start values, we will run the model first in proc genmod and then in proc nlmixed. We would not technically need to use proc nlmixed at all at this stage; however, because we do need to use it later, we will use it now to show how each piece is built up to make the final product clearer.

Here is the code and output for the poisson model from proc genmod.


/*read data in from internet*/
filename tmp url 'http://stats.idre.ucla.edu/stat/data/hdp.csv';

data hdp;
infile tmp dlm=',' firstobs=2;
input tumorsize co2 pain wound mobility ntumors nmorphine 
  remission lungcapacity Age Married $ FamilyHx $ SmokingHx $
  Sex $ CancerStage $ LengthofStay WBC RBC BMI IL6 CRP DID $
  Experience School $ Lawsuits HID $ Medicaid;
run;

/*First because we are using proc nlmixed*/
/*we will manually dummy code some variables*/
data hdp;
  set hdp;
  if SmokingHx = "former" then SmokingHx2 = 1;
    else SmokingHx2 = 0;
  if SmokingHx = "never" then SmokingHx3 = 1;
    else SmokingHx3 = 0;
  if CancerStage = "II" then CancerStage2 = 1;
    else CancerStage2 = 0;
  if CancerStage = "III" then CancerStage3 = 1;
    else CancerStage3 = 0;
  if CancerStage = "IV" then CancerStage4 = 1;
    else CancerStage4 = 0;
run;

/*Basic poisson regression model*/

/*using genmod to get start values for nlmixed*/
proc genmod data = hdp;
  model nmorphine = Age LengthofStay BMI 
    CancerStage2 CancerStage3 CancerStage4 / dist=poisson;
run;

Model Information
Data Set	WORK.HDP
Distribution	Poisson
Link Function	Log
Dependent Variable	nmorphine

Number of Observations Read	8525
Number of Observations Used	8525

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	8518	16612.7231	1.9503
Scaled Deviance	8518	16612.7231	1.9503
Pearson Chi-Square	8518	14318.9872	1.6810
Scaled Pearson X2	8518	14318.9872	1.6810
Log Likelihood		9145.9956
Full Log Likelihood		-20069.9436
AIC (smaller is better)		40153.8871
AICC (smaller is better)		40153.9003
BIC (smaller is better)		40203.2425

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter	DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept	1	0.8387	0.0567	0.7276	0.9498	219.01	<.0001
Age	1	-0.0011	0.0011	-0.0032	0.0010	1.07	0.2999
LengthofStay	1	0.0025	0.0064	-0.0101	0.0150	0.15	0.7014
BMI	1	0.0128	0.0008	0.0112	0.0144	247.08	<.0001
CancerStage2	1	0.0963	0.0149	0.0671	0.1256	41.78	<.0001
CancerStage3	1	0.2094	0.0180	0.1742	0.2446	135.92	<.0001
CancerStage4	1	0.3059	0.0226	0.2615	0.3503	182.44	<.0001
Scale	0	1.0000	0.0000	1.0000	1.0000

Note:

The scale parameter was held fixed.

Now we do the same thing, just a basic poisson model, but using proc nlmixed.


/*using nlmixed*/
proc nlmixed data = hdp;
  parms b0=.839 b1=-.001 b2=.003 b3=.013 b4=.096 b5=.209 b6=.306;
  mu = exp(b0 + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  ll = nmorphine*log(mu) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Optimization Technique	Dual Quasi-Newton
Integration Method	None

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Parameters	7

Parameters
b0	b1	b2	b3	b4	b5	b6	NegLogLike
0.839	-0.001	0.003	0.013	0.096	0.209	0.306	20073.0939

Iteration History
Iter	Calls	NegLogLike	Diff	MaxGrad	Slope
1	7	20069.9939	3.100006	2621.46	-7006027
2	11	20069.9887	0.00526	2525.368	-177.838
3	14	20069.9845	0.004112	2515.065	-2.19468
4	15	20069.9819	0.002598	2251.426	-0.11918
5	16	20069.9687	0.013271	1410.25	-0.28196
6	18	20069.949	0.019634	197.8158	-1.25373
7	20	20069.9441	0.004915	117.4308	-0.13077
8	22	20069.9436	0.000485	1.448836	-0.00119
9	24	20069.9436	0.000061	11.84395	-0.00029
10	26	20069.9436	6.098E-6	0.899909	-0.00001

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	40140
AIC (smaller is better)	40154
AICC (smaller is better)	40154
BIC (smaller is better)	40203

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	0.8387	0.05667	8525	14.80	<.0001	0.05	0.7276	0.9498	0.016664
b1	-0.00110	0.001064	8525	-1.04	0.3000	0.05	-0.00319	0.000983	0.899909
b2	0.002456	0.006406	8525	0.38	0.7014	0.05	-0.01010	0.01501	0.099973
b3	0.01281	0.000815	8525	15.72	<.0001	0.05	0.01121	0.01440	0.548677
b4	0.09634	0.01491	8525	6.46	<.0001	0.05	0.06712	0.1256	0.015835
b5	0.2094	0.01796	8525	11.66	<.0001	0.05	0.1742	0.2446	0.01362
b6	0.3059	0.02265	8525	13.51	<.0001	0.05	0.2615	0.3503	-0.00755

Zero-inflated poisson model

In a zero-inflated poisson model, the we observe more zero values than would be expected under a poisson model. The assumption is that these excess zeros are caused by an alternative stochastic process. One way to address this is to explicitly model the probability of being a zero or not, and then modify the estimation of the poisson model based on this. Thus zero-inflated poisson models typically have two portions. The first is a logistic model for being a zero or not, the second is a poisson model for the count. For the probability, we create a linear predictor (call it Xb): \begin{equation} Xb_{i} = alpha + beta x_{i} \end{equation} of course we can have more than one predictor and they need not be the same predictors as used for the poisson model. Then we can use the inverse logit function to get the (conditional) probability: \begin{equation} P(y_{i} = 1 | x_{i}) = p_{i} = \frac{1}{1 + e^{-Xb_{i}}} \end{equation} we can use this probability and create a new log likelihood function for the zero-inflated poisson model, that is: \begin{equation} y_{i} = \left\{ \begin{array}{rl} ln(p_{i} + (1 – p_{i})e^{(-\mu_{i})}) &\mbox{if $y_{i} = 0$} \\ y_{i}ln(\mu_{i}) + ln(1 – p_{i}) – \mu_{i} – ln(y_{i}!) &\mbox{if $y_{i} > 0$} \end{array} \right. \end{equation} The log likelihood function changes depending on whether the observed value is zero or not. Also, note that when $y_{i} = 0$, $y_{i}ln(\mu_{i})$ and $ln(y_{i}!)$ are zero and so drop out (although we could have written them if we wanted to). Again, we will run the model twice, first using proc genmod to get start values and then using proc nlmixed to show how the likelihood function is coded.


/*Complication 1: zero-inflated poisson regression model*/	

/*using genmod to get start values for nlmixed*/
proc genmod data = hdp;
  model nmorphine = Age LengthofStay BMI
    CancerStage2 CancerStage3 CancerStage4 / dist=zip;
  zeromodel Age SmokingHx2 SmokingHx3 
    CancerStage2 CancerStage3 CancerStage4 /link = logit;
run;

Model Information
Data Set	WORK.HDP
Distribution	Zero Inflated Poisson
Link Function	Log
Dependent Variable	nmorphine

Number of Observations Read	8525
Number of Observations Used	8525

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance		37957.2436
Scaled Deviance		37957.2436
Pearson Chi-Square	8511	10079.3779	1.1843
Scaled Pearson X2	8511	10079.3779	1.1843
Log Likelihood		10237.3174
Full Log Likelihood		-18978.6218
AIC (smaller is better)		37985.2436
AICC (smaller is better)		37985.2929
BIC (smaller is better)		38083.9542

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter	DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept	1	1.0569	0.0584	0.9425	1.1713	327.70	<.0001
Age	1	-0.0020	0.0011	-0.0042	0.0001	3.33	0.0678
LengthofStay	1	0.0094	0.0066	-0.0035	0.0223	2.06	0.1515
BMI	1	0.0122	0.0008	0.0106	0.0138	214.47	<.0001
CancerStage2	1	0.0239	0.0155	-0.0064	0.0543	2.40	0.1216
CancerStage3	1	0.0530	0.0186	0.0165	0.0895	8.11	0.0044
CancerStage4	1	0.1322	0.0233	0.0865	0.1779	32.11	<.0001
Scale	0	1.0000	0.0000	1.0000	1.0000

Note:

The scale parameter was held fixed.

Analysis Of Maximum Likelihood Zero Inflation Parameter Estimates
Parameter	DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept	1	-1.6766	0.3400	-2.3430	-1.0103	24.32	<.0001
Age	1	-0.0116	0.0070	-0.0252	0.0021	2.76	0.0968
SmokingHx2	1	0.7479	0.1330	0.4872	1.0086	31.62	<.0001
SmokingHx3	1	1.2588	0.1233	1.0172	1.5003	104.30	<.0001
CancerStage2	1	-0.8077	0.0898	-0.9838	-0.6316	80.82	<.0001
CancerStage3	1	-2.1410	0.1767	-2.4873	-1.7947	146.84	<.0001
CancerStage4	1	-2.7530	0.3195	-3.3793	-2.1268	74.23	<.0001

Now we do the same thing, a zero-inflated poisson model in proc nlmixed.


/*using nlmixed*/
proc nlmixed data = hdp;
  parms b0=1.057 b1=-.002 b2=.009 b3=.012 b4=.024 b5=.053 b6=.132
        a0=-1.677 a1=-.012 a2=.748 a3=1.259 a4=-.808 a5=-2.141 a6=-2.753;
  logit0 = a0 + a1*Age + a2*SmokingHx2 + a3*SmokingHx3 +
    a4*CancerStage2 + a5*CancerStage3 + a6*CancerStage4;
  prob0 = 1 / (1 + exp(-logit0));
  mu = exp(b0 + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  if nmorphine = 0 then
    ll = log(prob0 + (1 - prob0)*exp(-mu));
  else
    ll = nmorphine*log(mu) + log(1 - prob0) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Optimization Technique	Dual Quasi-Newton
Integration Method	None

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Parameters	14

Parameters
b0	b1	b2	b3	b4	b5	b6	a0	a1	a2	a3	a4	a5	a6	NegLogLike
1.057	-0.002	0.009	0.012	0.024	0.053	0.132	-1.677	-0.012	0.748	1.259	-0.808	-2.141	-2.753	18979.7392

Iteration History
Iter	Calls	NegLogLike	Diff	MaxGrad	Slope
1	7	18978.8018	0.937379	716.6314	-1941621
2	9	18978.6258	0.176023	428.8037	-5.62909
3	11	18978.6252	0.00057	336.3406	-0.00646
4	13	18978.623	0.002206	138.3274	-0.00782
5	15	18978.6218	0.001139	0.500645	-0.00082
6	17	18978.6218	0.000034	7.872104	-4.26E-6

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	37957
AIC (smaller is better)	37985
AICC (smaller is better)	37985
BIC (smaller is better)	38084

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	1.0570	0.05838	8525	18.10	<.0001	0.05	0.9425	1.1714	0.189924
b1	-0.00200	0.001097	8525	-1.83	0.0678	0.05	-0.00416	0.000146	7.872104
b2	0.009433	0.006577	8525	1.43	0.1515	0.05	-0.00346	0.02232	0.83501
b3	0.01221	0.000834	8525	14.64	<.0001	0.05	0.01057	0.01384	5.376756
b4	0.02389	0.01547	8525	1.54	0.1226	0.05	-0.00644	0.05422	-0.20567
b5	0.05300	0.01862	8525	2.85	0.0044	0.05	0.01650	0.08950	0.134519
b6	0.1322	0.02333	8525	5.66	<.0001	0.05	0.08642	0.1779	-0.05091
a0	-1.6770	0.3400	8525	-4.93	<.0001	0.05	-2.3435	-1.0105	0.050035
a1	-0.01155	0.006961	8525	-1.66	0.0970	0.05	-0.02520	0.002092	2.63271
a2	0.7480	0.1330	8525	5.62	<.0001	0.05	0.4873	1.0087	0.003577
a3	1.2590	0.1233	8525	10.21	<.0001	0.05	1.0174	1.5006	0.049444
a4	-0.8080	0.08984	8525	-8.99	<.0001	0.05	-0.9841	-0.6319	-0.01239
a5	-2.1410	0.1766	8525	-12.12	<.0001	0.05	-2.4873	-1.7947	0.011087
a6	-2.7530	0.3194	8525	-8.62	<.0001	0.05	-3.3792	-2.1268	0.004292

Mixed effects poisson models

If the data are clustered, we may want to run a mixed effects poisson model. The model is similar to the original poisson model: \begin{equation} ln(E(y_{ij} | u_{j})) = \alpha + \beta x_{ij} + u_{j} \end{equation} where \begin{equation} u_{j} \sim \mathcal{N}(0, \sigma^{2}) \end{equation} that is, conditional on the random effect for group j ($mu_{j}$ we assume the regular poisson model for individual i, and the random effects are assumed to follow a normal distribution with mean 0 (the overall is captured by the intercept, $alpha$) and some estimated variance. This can be done in SAS using proc glimmix. We will run the model twice, first using proc glimmix to get start values and then using proc nlmixed to show how the model is setup. For proc glimmix we use the Laplace approximation, which is faster, though (in some cases) slightly less accurate the Gauss Hermite approximation. This is because we are just using proc glimmix for start values and this is not our final model anyway.


/*Complication 2: poisson regression model with random effect*/	

/*Start by sorting data*/
proc sort data = hdp;
  by DID;
run;

/*using glimmix to get start values for nlmixed*/
proc glimmix data = hdp noclprint method=laplace;
  class DID;
  model nmorphine = Age LengthofStay BMI 
    CancerStage2 CancerStage3 CancerStage4 / solution dist=poisson;
  random intercept / subject = DID;
run;

Model Information
Data Set	WORK.HDP
Response Variable	nmorphine
Response Distribution	Poisson
Link Function	Log
Variance Function	Default
Variance Matrix Blocked By	DID
Estimation Technique	Maximum Likelihood
Likelihood Approximation	Laplace
Degrees of Freedom Method	Containment

Number of Observations Read	8525
Number of Observations Used	8525

Dimensions
G-side Cov. Parameters	1
Columns in X	7
Columns in Z per Subject	1
Subjects (Blocks in V)	407
Max Obs per Subject	40

Optimization Information
Optimization Technique	Dual Quasi-Newton
Parameters in Optimization	8
Lower Boundaries	1
Upper Boundaries	0
Fixed Effects	Not Profiled
Starting From	GLM estimates

Iteration History
Iteration	Restarts	Evaluations	Objective Function	Change	Max Gradient
0	0	4	35858.352322	.	21925.52
1	0	7	35840.656816	17.69550667	1001.813
2	0	5	35840.143453	0.51336315	1645.478
3	0	4	35807.448912	32.69454071	3081.581
4	0	4	35803.761002	3.68791037	3518.03
5	0	3	35803.167408	0.59359346	3485.332
6	0	2	35802.576103	0.59130558	3744.727
7	0	3	35801.673997	0.90210556	3945.63
8	0	3	35801.150741	0.52325577	3753.626
9	0	4	35799.632676	1.51806529	350.1526
10	0	3	35799.530694	0.10198144	179.2901
11	0	3	35799.525701	0.00499353	18.13321
12	0	3	35799.525594	0.00010655	0.531298

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics
-2 Log Likelihood	35799.53
AIC (smaller is better)	35815.53
AICC (smaller is better)	35815.54
BIC (smaller is better)	35847.60
CAIC (smaller is better)	35855.60
HQIC (smaller is better)	35828.22

Fit Statistics for Conditional Distribution
-2 log L(nmorphine \| r. effects)	34280.09
Pearson Chi-Square	9479.07
Pearson Chi-Square / DF	1.11

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	DID	0.2719	0.02289

Solutions for Fixed Effects
Effect	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept	0.7465	0.06380	406	11.70	<.0001
Age	-0.00178	0.001088	8112	-1.64	0.1020
LengthofStay	0.005626	0.006556	8112	0.86	0.3909
BMI	0.01304	0.000840	8112	15.52	<.0001
CancerStage2	0.07924	0.01524	8112	5.20	<.0001
CancerStage3	0.2049	0.01839	8112	11.14	<.0001
CancerStage4	0.3166	0.02327	8112	13.61	<.0001

Type III Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
Age	1	8112	2.67	0.1020
LengthofStay	1	8112	0.74	0.3909
BMI	1	8112	240.85	<.0001
CancerStage2	1	8112	27.05	<.0001
CancerStage3	1	8112	124.08	<.0001
CancerStage4	1	8112	185.11	<.0001

Now we do the same thing, a mixed effects poisson model in proc nlmixed.


/*using nlmixed*/
proc nlmixed data = hdp;
  parms b0=.7465 b1=-.0018 b2=.0056 b3=.013 b4=.0792 b5=.2049 b6=.3166
    sigma2=.27;
  mu = exp(b0 + u + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  ll = nmorphine*log(mu) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
  random u ~ normal(0, sigma2) subject=DID;
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Random Effects	u
Distribution for Random Effects	Normal
Subject Variable	DID
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Subjects	407
Max Obs Per Subject	40
Parameters	8
Quadrature Points	1

Parameters
b0	b1	b2	b3	b4	b5	b6	sigma2	NegLogLike
0.7465	-0.0018	0.0056	0.013	0.0792	0.2049	0.3166	0.27	17904.0597

Iteration History
Iter	Calls	NegLogLike	Diff	MaxGrad	Slope
1	6	17904.0551	0.004516	94.80357	-1099.33
2	10	17904.0187	0.036449	233.9555	-499.001
3	14	17899.9517	4.067009	203.8813	-3053.86
4	18	17899.9395	0.012235	197.404	-6.62972
5	20	17899.9364	0.00303	189.5926	-0.84223
6	22	17899.923	0.013452	164.0844	-4.23162
7	24	17899.8277	0.09526	54.09265	-4.39735
8	26	17899.7657	0.062047	24.42641	-3.77556
9	29	17899.764	0.0017	22.32058	-0.02513
10	31	17899.7628	0.001166	3.493531	-0.00261
11	33	17899.7628	1.584E-6	0.552605	-3.63E-6

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	35800
AIC (smaller is better)	35816
AICC (smaller is better)	35816
BIC (smaller is better)	35848

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	0.7465	0.06380	406	11.70	<.0001	0.05	0.6211	0.8719	-0.00806
b1	-0.00178	0.001088	406	-1.64	0.1027	0.05	-0.00392	0.000359	-0.55261
b2	0.005626	0.006556	406	0.86	0.3913	0.05	-0.00726	0.01851	-0.05281
b3	0.01304	0.000840	406	15.52	<.0001	0.05	0.01139	0.01469	-0.35407
b4	0.07924	0.01524	406	5.20	<.0001	0.05	0.04929	0.1092	-0.0033
b5	0.2049	0.01839	406	11.14	<.0001	0.05	0.1687	0.2410	-0.00666
b6	0.3166	0.02327	406	13.61	<.0001	0.05	0.2708	0.3623	-0.00113
sigma2	0.2719	0.02289	406	11.88	<.0001	0.05	0.2269	0.3169	0.002184

Putting it together: mixed effects zero-inflated poisson model

There is no way to run a mixed effects model in proc genmod and there is no way to include zero-inflation in proc glimmix so in SAS, this model (to our knowlege) can currently only be fit using proc nlmixed. Because this model cannot be fit with another SAS procedure, there is no direct way to get starting values for all the parameters. What we did was for the fixed effects, use the estimates from the zero-inflated poisson model run in proc genmod and for the random effects, use the estimates from proc glimmix. Combined, we at least have some (hopefully reasonable) start values for all the parameters. The start values do not need to be perfect, but usually the better they are, the faster the model will converge. Also, in some cases inadequate start values may result in the optimizer searching in a region of the parameter space where it cannot progress. In this case, there are two main options. First you can try tuning your start values. Alternately, you can try a different algorithm that may be able to progress out of the region the other algorithm got “stuck in”. Essentially, change where you are looking (different start values) or change how you are looking (different algorithm). Note that for variance components (in this case just sigma2), it is often better to overestimate than to underestimate them.

Unfortunately, using the coefficients from proc genmod and proc glmmix as start values will not guarantee that the model in proc nlmixed will converge. We see that using our start values as they are will cause SAS to return an error that the model has not converged:


/*Putting it together: zero-inflated poisson regression model with random effect*/

/*to our knowledge this is only possible in SAS using nlmixed*/
proc nlmixed data = hdp method=gauss qpoints=25;
  parms b0=1.0569 b1=-0.0020 b2=0.0094 b3=0.0122 b4=0.0239 b5=0.0530 b6=0.1322
        a0=-1.6766 a1=-0.0116 a2=0.7479 a3=1.2588 a4=-0.8077 a5=-2.1410 a6=-2.7530 	
        sigma2=0.2414;
  logit0 = a0 + a1*Age + a2*SmokingHx2 + a3*SmokingHx3 +
    a4*CancerStage2 + a5*CancerStage3 + a6*CancerStage4;
  prob0 = 1 / (1 + exp(-logit0));
  mu = exp(b0 + u + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  if nmorphine = 0 then
    ll = log(prob0 + (1 - prob0)*exp(-mu));
  else
	ll = nmorphine*log(mu) + log(1 - prob0) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
  random u ~ normal(0, sigma2) subject=DID;
run;

ERROR: Optimization cannot be completed.

It is difficult to tell what has gone wrong with this error. At times, using the itdetails option on the proc nlmixed statement can help diagnose the error, as you can see how the parameters change at each iteration and look for problematic parameters. The detailed iteration history provides no help in this case. We recommend varying the starting value for the random effect first, as we saw that the proc nlmixed model converges fine if the random effect is excluded (see “Zero-inflated poisson model” above). Indeed, changing the start value for sigma2 from 0.2414 to 0.3 will allow the model to converge:

proc nlmixed data = hdp method=gauss qpoints=25;
  parms b0=1.0569 b1=-0.0020 b2=0.0094 	 b3=0.0122 b4=0.0239 b5=0.0530 b6=0.1322
        a0=-1.6766 a1=-0.0116 a2=0.7479 a3=1.2588 a4=-0.8077 a5=-2.1410 a6=-2.7530 	
        sigma2=0.3;
  logit0 = a0 + a1*Age + a2*SmokingHx2 + a3*SmokingHx3 +
    a4*CancerStage2 + a5*CancerStage3 + a6*CancerStage4;
  prob0 = 1 / (1 + exp(-logit0));
  mu = exp(b0 + u + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  if nmorphine = 0 then
    ll = log(prob0 + (1 - prob0)*exp(-mu));
  else
	ll = nmorphine*log(mu) + log(1 - prob0) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
  random u ~ normal(0, sigma2) subject=DID;
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Random Effects	u
Distribution for Random Effects	Normal
Subject Variable	DID
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Subjects	407
Max Obs per Subject	40
Parameters	15
Quadrature Points	25

Parameters
b0	b1	b2	b3	b4	b5	b6	a0	a1	a2	a3	a4	a5	a6	sigma2	NegLogLike
1.0569	-0.002	0.0094	0.0122	0.0239	0.053	0.1322	-1.6766	-0.0116	0.7479	1.2588	-0.8077	-2.141	-2.753	0.3	17911.4503

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	35436
AIC (smaller is better)	35466
AICC (smaller is better)	35466
BIC (smaller is better)	35526

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	0.8364	0.06448	406	12.97	<.0001	0.05	0.7096	0.9631	-0.0028
b1	-0.00242	0.001111	406	-2.18	0.0300	0.05	-0.00460	-0.00024	-0.21477
b2	0.01052	0.006667	406	1.58	0.1154	0.05	-0.00259	0.02362	-0.0123
b3	0.01286	0.000854	406	15.06	<.0001	0.05	0.01118	0.01454	-0.1621
b4	0.04828	0.01596	406	3.03	0.0026	0.05	0.01691	0.07966	0.000687
b5	0.1398	0.01899	406	7.36	<.0001	0.05	0.1025	0.1771	0.01237
b6	0.2496	0.02412	406	10.35	<.0001	0.05	0.2022	0.2971	-0.00403
a0	-4.4730	2.5151	406	-1.78	0.0761	0.05	-9.4173	0.4713	-0.01752
a1	-0.03124	0.01319	406	-2.37	0.0183	0.05	-0.05718	-0.00531	-1.01969
a2	3.5560	2.4335	406	1.46	0.1447	0.05	-1.2279	8.3399	0.001648
a3	4.5860	2.4347	406	1.88	0.0603	0.05	-0.2003	9.3722	-0.01543
a4	-1.3089	0.1686	406	-7.77	<.0001	0.05	-1.6403	-0.9776	-0.01202
a5	-7.3938	9.3167	406	-0.79	0.4279	0.05	-25.7089	10.9212	0.010246
a6	-4.9867	3.0612	406	-1.63	0.1041	0.05	-11.0045	1.0312	-0.01818
sigma2	0.2414	0.02097	406	11.51	<.0001	0.05	0.2002	0.2826	-0.022

Extending the random effects

Previously we fit a mixed effects zero-inflated poisson model. Now we are going to fit the same model, but include a random slope and covariance between the intercept and slope. Again our procedure for selecting start values was to use the prior proc nlmixed model estimates, and run a model in proc glimmix with a random intercept and random slope, and use the variance estimates from proc glimmix for the variance components. Once the model converged, we updated the start values in the code to be close to the final estimates to speed future runs of the code.


/*Extending the random effects 1*/
proc nlmixed data = hdp method=gauss qpoints=25;
  parms b0=0.8364 b1=-0.00242 b2=0.01052 b3=0.01286 b4=0.04828 b5=0.1398 b6=0.2496
        a0=-4.4730 a1=-0.03124 a2=3.5560 a3=4.5860 a4=-1.3089 a5=-7.3938  a6=-4.9867
        sigma2_u0=0.2414 sigma_cov = -0.00044 sigma2_u1=0.000016;
  bounds sigma2_u0 sigma2_u1 >= 0;
  logit0 = a0 +  a1*Age + a2*SmokingHx2 + a3*SmokingHx3 +
    a4*CancerStage2 + a5*CancerStage3 + a6*CancerStage4;
  prob0 = 1 / (1 + exp(-logit0));
  mu = exp(b0 + u0 + b1*Age + b2*LengthofStay + b3*BMI + u1*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  if nmorphine = 0 then
    ll = log(prob0 + (1 - prob0)*exp(-mu));
  else
	ll = nmorphine*log(mu) + log(1 - prob0) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
  random u0 u1 ~ normal([0, 0], [sigma2_u0, sigma_cov, sigma2_u1]) subject=DID;
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Random Effects	u0 u1
Distribution for Random Effects	Normal
Subject Variable	DID
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Subjects	407
Max Obs per Subject	40
Parameters	17
Quadrature Points	25

Parameters
b0	b1	b2	b3	b4	b5	b6	a0	a1	a2	a3	a4	a5	a6	sigma2_u0	sigma_cov	sigma2_u1	NegLogLike
0.8364	-0.00242	0.01052	0.01286	0.04828	0.1398	0.2496	-4.473	-0.03124	3.556	4.586	-1.3089	-7.3938	-4.9867	0.2414	-0.00044	0.000016	17717.9039

Iteration History
Iter	Calls	NegLogLike	Diff	MaxGrad	Slope
1	18	17717.841	0.062974	17792.83	-1.433E7
2	22	17717.7906	0.050349	15982.54	-0.0633
3	26	17717.6647	0.125895	17730.7	-0.05606
4	28	17717.6051	0.059575	7275.173	-0.08553
5	30	17717.5948	0.010335	835.4337	-0.02134
6	32	17717.5904	0.004445	4405.393	-0.00074
7	36	17717.5792	0.011183	568.8423	-0.00594
8	39	17717.5785	0.000716	489.4676	-0.00052
9	41	17717.5773	0.001161	495.6351	-0.00055
10	44	17717.5772	0.000145	49.15044	-0.00028
11	46	17717.5771	0.000037	487.2186	-6.78E-6

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	35435
AIC (smaller is better)	35469
AICC (smaller is better)	35469
BIC (smaller is better)	35537

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	0.8364	0.06615	405	12.64	<.0001	0.05	0.7064	0.9664	-1.89084
b1	-0.00233	0.001113	405	-2.10	0.0365	0.05	-0.00452	-0.00015	2.592628
b2	0.01054	0.006677	405	1.58	0.1154	0.05	-0.00259	0.02366	-6.028
b3	0.01265	0.001008	405	12.55	<.0001	0.05	0.01067	0.01463	-1.25884
b4	0.04828	0.01598	405	3.02	0.0027	0.05	0.01686	0.07970	-0.37257
b5	0.1398	0.01902	405	7.35	<.0001	0.05	0.1024	0.1772	-1.16558
b6	0.2496	0.02413	405	10.34	<.0001	0.05	0.2022	0.2970	1.184339
a0	-4.4730	2.4783	405	-1.80	0.0718	0.05	-9.3450	0.3990	0.329585
a1	-0.03127	0.01320	405	-2.37	0.0183	0.05	-0.05722	-0.00531	15.43766
a2	3.5560	2.3961	405	1.48	0.1386	0.05	-1.1543	8.2663	0.129269
a3	4.5860	2.3971	405	1.91	0.0564	0.05	-0.1263	9.2983	0.197408
a4	-1.3089	0.1687	405	-7.76	<.0001	0.05	-1.6406	-0.9772	0.142722
a5	-7.3938	8.8262	405	-0.84	0.4027	0.05	-24.7448	9.9572	0.011592
a6	-4.9867	3.0072	405	-1.66	0.0980	0.05	-10.8983	0.9249	-0.01304
sigma2_u0	0.2414	0.04520	405	5.34	<.0001	0.05	0.1525	0.3303	5.315591
sigma_cov	-0.00016	0.000803	405	-0.20	0.8413	0.05	-0.00174	0.001417	-20.4002
sigma2_u1	0.000014	0.000018	405	0.75	0.4513	0.05	-0.00002	0.000049	-487.219

We can also fit a model that allows a random intercept for the zero-inflation portion. We assume that the two random intercepts (for the zero-inflation and poisson portion) are uncorrelated. The start value for sigma2_u1 of 1 was chosen arbitrarily, but results in model convergence.


/*Extending the random effects 2*/
proc nlmixed data = hdp method=gauss qpoints=25 maxfunc = 3000 maxiter = 1000;
  parms b0=0.8463 b1=-0.00245 b2=0.01118 b3=0.01246 b4=0.04719 b5=0.1427 b6=0.2496
        a0=-4.5314 a1=-0.03164 a2=3.6122 a3=4.6438 a4=-1.3091 a5=-3.4101 a6=-4.8750
        sigma2_u0=0.2300 sigma2_u1=1;
  bounds sigma2_u0 sigma2_u1 >= 0;
  logit0 = a0 + u1 + a1*Age + a2*SmokingHx2 + a3*SmokingHx3 +
    a4*CancerStage2 + a5*CancerStage3 + a6*CancerStage4;
  prob0 = 1 / (1 + exp(-logit0));
  mu = exp(b0 + u0 + b1*Age + b2*LengthofStay + b3*BMI + 
    b4*CancerStage2 + b5*CancerStage3 + b6*CancerStage4);
  if nmorphine = 0 then
    ll = log(prob0 + (1 - prob0)*exp(-mu));
  else
	ll = nmorphine*log(mu) + log(1 - prob0) - mu - lgamma(nmorphine + 1);
  model nmorphine ~ general(ll);
  random u0 u1 ~ normal([0, 0], [sigma2_u0, 0, sigma2_u1]) subject=DID;
run;

Specifications
Data Set	WORK.HDP
Dependent Variable	nmorphine
Distribution for Dependent Variable	General
Random Effects	u0 u1
Distribution for Random Effects	Normal
Subject Variable	DID
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

Dimensions
Observations Used	8525
Observations Not Used	0
Total Observations	8525
Subjects	407
Max Obs Per Subject	40
Parameters	16
Quadrature Points	25

Parameters
b0	b1	b2	b3	b4	b5	b6	a0	a1	a2	a3	a4	a5	a6	sigma2_u0	sigma2_u1	NegLogLike
0.8463	-0.00245	0.01118	0.01246	0.04719	0.1427	0.2496	-4.5314	-0.03164	3.6122	4.6438	-1.3091	-3.4101	-4.875	0.23	1	17563.6292

NOTE: GCONV convergence criterion satisfied.

Fit Statistics
-2 Log Likelihood	34688
AIC (smaller is better)	34720
AICC (smaller is better)	34720
BIC (smaller is better)	34784

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	0.9392	0.06140	405	15.30	<.0001	0.05	0.8185	1.0599	-0.59831
b1	-0.00257	0.001108	405	-2.32	0.0209	0.05	-0.00475	-0.00039	-7.43015
b2	0.01190	0.006661	405	1.79	0.0747	0.05	-0.00119	0.02500	-1.05967
b3	0.01289	0.000850	405	15.16	<.0001	0.05	0.01122	0.01456	-4.09098
b4	0.03497	0.01573	405	2.22	0.0268	0.05	0.004044	0.06590	0.350269
b5	0.1014	0.01885	405	5.38	<.0001	0.05	0.06431	0.1384	0.550236
b6	0.1994	0.02380	405	8.38	<.0001	0.05	0.1526	0.2462	0.457775
a0	-4.3705	0.7769	405	-5.63	<.0001	0.05	-5.8977	-2.8432	-0.07537
a1	-0.04393	0.01339	405	-3.28	0.0011	0.05	-0.07025	-0.01761	3.481771
a2	1.8553	0.2836	405	6.54	<.0001	0.05	1.2979	2.4128	0.17952
a3	3.1294	0.3000	405	10.43	<.0001	0.05	2.5396	3.7192	4.862539
a4	-1.5995	0.1878	405	-8.52	<.0001	0.05	-1.9686	-1.2303	6.171022
a5	-4.3558	0.3505	405	-12.43	<.0001	0.05	-5.0447	-3.6668	4.562017
a6	-5.1794	0.5905	405	-8.77	<.0001	0.05	-6.3402	-4.0187	4.326418
sigma2_u0	0.1014	0.009353	405	10.84	<.0001	0.05	0.08298	0.1198	1.482904
sigma2_u1	17.8147	4.1006	405	4.34	<.0001	0.05	9.7537	25.8758	0.903648

Comments

When other SAS procedures will fit a model, they are often easier to use and faster. However, for uncommon or new models that are not implemented in SAS, using proc nlmixed allows a great deal of flexibility in model specification.

References

1. Voronca, D.C. et al.(2014). Analysis of Zero Inflated Longitudinal Data using Proc Nlmixed. Conference Proceedings: SouthEast SAS Users Group. 2. Min, Y. and Agresti, A.(2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modeling; 5:1-19. 3. Runze, A.B. and Zucker, R.A.(2012). Statistical models for longitudinal zero-inflated count data with application to the substance abuse fields. Stat Med; 31(29):4074-4086.