How do I run a random effect tobit model using nlmixed?

For this example, we use a simulated dataset. The SAS program containing all the code for this page may be downloaded here. You can download the data for this example from here. Ystar is the original variable, and Ycensr is a right censored version of Ystar censored at 16. Real examples where this could be useful are tests that have a maximum score (i.e., the measure cannot distinguish anyone above a certain threshold although there is still presumably some variability). In this example, groups were sampled at random, but individuals were sampled from within groups. We want to use group random effects. If the true distribution of individuals’ responses follows a Gaussian distribution, but the nature of the instrument means that every who is above 16 is censored at 16 then: \begin{equation} y_{i} = \left\{ \begin{array}{rl} y^{*}_{i} &\mbox{if $y^{*}_{i} < y_{u}$} \\ y_{u} &\mbox{if $y^{*}_{i} \ge y_{u}$} \end{array} \right. \end{equation} That is, our observed values are the true latent values below the upper bound and are the upper bound otherwise. This is the Tobit model or a censored regression model. This model can be fit with proc qlim or proc lifereg when there are only fixed effects. We consider the case where the model includes random effects also. That is: \begin{equation} \mathbf{y} = \boldsymbol{X\beta} + \mathbf{Zb} + \boldsymbol{\varepsilon} \end{equation}

Where $\mathbf{y}$ is a n x 1 column vector, the outcome variable; $\mathbf{X}$ is a n x p matrix of the p predictor variables; $\boldsymbol{beta}$ is a p x 1 column vector of the fixed-effects regression coefficients (the “betas”); $\mathbf{Z}$ is the n x q design matrix for the q random effects (the random counter part to the fixed $\mathbf{X}$); $\mathbf{b}$ is a q x 1 vector of the random effects (the random counterpart to the fixed $\boldsymbol{\beta}$; and $\boldsymbol{\varepsilon}$ is a n x 1 column vector of the residuals, that part of $\mathbf{y}$ that is not explained by the model, $\boldsymbol{X\beta} + \mathbf{Zb}$.

For example, if you have (i = 35) observations in each of (j = 110) groups, then (n = ij = 3850). n and p can be quite large without being overly burdensome computationally; however, q, the random effects, can quickly become computationally difficult. To recap: \begin{equation} \overbrace{\mathbf{y}}^{\mbox{n x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}}_{\mbox{n x p}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{p x 1}}^{\mbox{n x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}}_{\mbox{n x q}} \quad \underbrace{\mathbf{b}}_{\mbox{q x 1}}^{\mbox{n x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{n x 1}} \end{equation}

Because we observe $\mathbf{y}$ which is a censored version of the true values, $\mathbf{y^{*}}$, we cannot use proc mixed to fit the mixed-effects model. Instead, we will build our own likelihood function based on the likelihood function for the Tobit and use that with proc nlmixed to model the censored outcome with fixed and random effects.

Theoretical background

First, let’s look at some background on the Gaussian (normal) distribution to better understand what the likelihood function will be. The probability density function of the Gaussian distribution is given by: \begin{equation} PDF(x) = \left( \frac{1}{\sigma sqrt{2 pi}}\right) e^{\frac{-(x – \mu)^{2}}{2 \sigma^{2}}} \end{equation} and the cumulative density function is: \begin{equation} CDF(x) = \phi \left(\frac{x – \mu}{\sigma} \right) = \frac{1}{2} \left( 1 + erf \left( \frac{x – \mu}{sqrt{2 \sigma^{2}}} \right) \right) \end{equation} where (erf) is the error function: \begin{equation} erf(x) = \frac{2}{sqrt{\pi}} int_{0}^{x} e^{-t^{2}} dt \end{equation}

Our likelihood function for right censored data is: \begin{equation} \mathcal{L}(\theta) = \left\{\begin{array}{rl} \left(\frac{1}{\sigma sqrt{2 \pi}}\right) e^{\frac{-(y_{ij} – \mu_{ij})^{2}}{2 \sigma^{2}}} &\mbox{ if $y_{ij} < y_{u}$} 1 – phi \left(\frac{y_{ij} – \mu_{ij}}{\sigma} \right) &\mbox{ if $y_{ij} \ge y_{u}$} \end{array} \right. \end{equation} When (y) is less than the upper censoring value, the likelihood is just the normal PDF. When (y) is greater than or equal to the upper censoring value, the likelihood is estimated differently.

In this particular data example, it will be: \begin{equation} \mathcal{L}(\theta) = \left\{ \begin{array}{rl} \left(\frac{1}{\sqrt{2 \pi \sigma^{2}}}\right) e^{\frac{-(Ycensr_{ij} – \mu_{ij})^{2}}{2 \sigma^{2}}} &\mbox{ if $Ycensr_{ij} < 16$} \\ 1 – \phi \left(\frac{Ycensr_{ij} – \mu_{ij}}{\sqrt{\sigma^{2}}} \right) &\mbox{ if $Ycensr_{ij} \ge 16$} \end{array} \right. \end{equation} where: \begin{equation} E(y_{ij}|b_{j}) = \mu_{ij} = \beta_{0} + \beta_{1}Xc1_{ij} + \beta_{2}Xc2_{ij} + \beta_{3}Xi1_{ij} + b_{0j} \end{equation}

So the likelihood function depends on the outcome variable, $\mu_{ij}$, which is based on our model, and the scale parameter, $\sigma^{2}$ which is estimated from $\varepsilon$, that part of the outcome our model does not explain.

Right censored with random intercept

First we read the data into SAS. This code we pull it in straight from our website, but if you need to access it offline, you can download the file by going to the address below too.



filename tmpsim url 'http://statistics.ats.ucla.edu/stat/data/simu.csv';
DATA Simu;
  INFILE tmpsim DLM=',' firstobs=2;
  input Y Ystar Ycensl Ycensr Xc1 Xc2 Xc3 Xc4 Xi1 Xi2 cat1 cat2 ID;
RUN;

Next we fit the true model on the uncensored data, this will serve as a point of comparison. Then we will demonstrate the bias that occurs from censoring the data and how the Tobit model can lessen the bias in the estimates. Note that throughout this page, only partial output is shown (SAS tends to be very liberal in the quantity of information it prints).


proc mixed data = Simu noclprint;
  class ID;
  model Ystar = Xc1 Xc2 Xi1 / solution;
  random int / subject=ID;
run; Model Information Data Set WORK.SIMU Dependent Variable Ystar Covariance Structure Variance Components Subject Effect ID Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject Estimate Intercept ID 0.6974 Residual 74.9236 Fit Statistics -2 Res Log Likelihood 27577.9 AIC (smaller is better) 27581.9 AICC (smaller is better) 27581.9 BIC (smaller is better) 27587.3 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 3.9080 0.1987 109 19.67

Now we look at the right censored data using the variable Ycensr. First, we will fit a naive linear mixed model that does not adjust at all for the censoring.


proc mixed data = Simu noclprint;
  class ID;
  model Ycensr = Xc1 Xc2 Xi1 / solution;
  random int / subject=ID;
run;  Covariance Parameter Estimates Cov Parm Subject Estimate Intercept ID 0.6020 Residual 58.6719 Fit Statistics -2 Res Log Likelihood 26640.2 AIC (smaller is better) 26644.2 AICC (smaller is better) 26644.2 BIC (smaller is better) 26649.6 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 3.5476 0.1773 109 20.01

Now we again look at the right censored variable Ycensr, this time using the Tobit model, i.e., adjusting the likelihood function for the censoring. Note that while the likelihood formula was presented above, we actually model the log of the likelihood.


proc nlmixed data=Simu XTOL=1E-12 method=GAUSS qpoints=100;
  parms beta0=7.3 beta1=-.21 beta2=.96 beta3=3
    sigma2_u=.7 sigma2=75;
  bounds sigma2_u sigma2 >= 0;
  pi = constant('pi');
  mu = beta0 + b_0j + beta1*Xc1 + beta2*Xc2 + beta3*Xi1;
  if Ycensr < 16 then ll = (1 / (sqrt(2*pi*sigma2))) * exp( -(Ycensr-mu)**2 / (2*sigma2) ); if Ycensr >= 16 then 
    ll = 1 - probnorm( (Ycensr - mu) / sqrt(sigma2) );
  L=log(ll);
  model Ycensr ~ general(L);
  random b_0j ~ normal(0, sigma2_u) subject=ID;
  run;
quit;  The NLMIXED Procedure Specifications Data Set WORK.SIMU Dependent Variable Ycensr Distribution for Dependent Variable General Random Effects b_0j Distribution for Random Effects Normal Subject Variable ID Optimization Technique Dual Quasi-Newton Integration Method Adaptive Gaussian Quadrature Dimensions Observations Used 3850 Observations Not Used 0 Total Observations 3850 Subjects 110 Max Obs Per Subject 35 Parameters 6 Quadrature Points 100 Parameters sigma2_u sigma2 beta0 beta1 beta2 beta3 NegLogLike 0.7 75 7.3 -0.21 0.96 3 12996.1027 Iteration History Iter Calls NegLogLike Diff MaxGrad Slope 1 3 12811.191 184.9117 35.97883 -952.67 2 4 12785.7952 25.39578 8.582992 -38.574 3 5 12780.5915 5.20372 7.955111 -9.94815 4 6 12778.7543 1.837234 4.220892 -3.00438 5 7 12777.9572 0.797032 2.717224 -1.43504 6 8 12776.7405 1.216737 3.510426 -0.50389 7 10 12776.2001 0.540379 2.099865 -0.82019 8 12 12775.9692 0.230922 1.423118 -0.31555 9 14 12773.5356 2.433648 6.822968 -0.28567 10 16 12772.4011 1.134466 1.619023 -2.85146 11 18 12772.2431 0.158024 0.633945 -0.35335 12 20 12772.2156 0.027479 0.100505 -0.04792 13 22 12772.2142 0.001338 0.043539 -0.00237 14 24 12772.2142 0.000045 0.004714 -0.0001 NOTE: GCONV convergence criterion satisfied. Fit Statistics -2 Log Likelihood 25544 AIC (smaller is better) 25556 AICC (smaller is better) 25556 BIC (smaller is better) 25573 Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient sigma2_u 0.6463 0.3611 109 1.79 0.0762 0.05 -0.06935 1.3619 -0.00092 sigma2 68.9545 1.7122 109 40.27

Comparing the parameter estimates, we have the “true” estimates obtained using the uncensored Ystar, and the estimates from using the censored Ycensr with a naive model and the Tobit model that accounts for censoring.

Parameter           True     Naive     Tobit
Intercept:          3.9080   3.5476    3.7901
Xc1:                0.2146   0.2073    0.2144
Xc2:                0.9657   0.8517    0.9467
Xi1:                3.3870   3.0338    3.3728
Intercept Variance: 0.6974   0.6020    0.6463
Residual Variance:  74.9236  58.6719   68.9545

The Tobit model is quite close to the true values, particularly for the slopes, and even the intercept is closer to the truth than the naive model. However, it is worth noting that using proc nlmixed is several orders of magnitude slower than using proc mixed. In most cases this is likely not an issue nowadays, but depending on the size of the dataset and how well-behaved the function being optimized is, it could be a consideration.

Right censored with random intercept and random slope

We could also consider a model with both a random intercept and random slope. While this is straight forward to add in proc mixed, it takes a little more effort in proc nlmixed. The likelihood function remains unchanged, but the model becomes:

\begin{equation} E(y_{ij}|b_{j}) = \mu_{ij} = \beta_{0} + \beta_{1}Xc1_{ij} + \beta_{2}Xc2_{ij} + \beta_{3}Xi1_{ij} + b_{0j} + b_{1j}Xc2_{ij} \end{equation}

with both a random intercept and a random slope for the effect of Xc2. As before, we will fit three models, first the true model, then a naive model, and finally a Tobit model. Note that we are using an unstructured covariance matrix for the random effects. That is, the intercept and slope have their own variances and are allowed to covary. Thus we are estimating two additional random effect parameters. We only bound the variances to be above or equal to 0, the covariance could be negative or positive so we do not bound it.


proc mixed data = Simu noclprint;
  class ID;
  model Ystar = Xc1 Xc2 Xi1 / solution;
  random int Xc2 / subject=ID type = un;
run;  The Mixed Procedure Model Information Data Set WORK.SIMU Dependent Variable Ystar Covariance Structure Unstructured Subject Effect ID Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Dimensions Covariance Parameters 4 Columns in X 4 Columns in Z Per Subject 2 Subjects 110 Max Obs Per Subject 35 Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) ID 0.7221 UN(2,1) ID 0.5905 UN(2,2) ID 0.2077 Residual 74.7233 Fit Statistics -2 Res Log Likelihood 27571.7 AIC (smaller is better) 27579.7 AICC (smaller is better) 27579.7 BIC (smaller is better) 27590.5 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 3 10.77 0.0130 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 3.9165 0.1991 109 19.68

Now we fit the same model on the right censored data, the “naive model”.


proc mixed data = Simu noclprint;
  class ID;
  model Ycensr = Xc1 Xc2 Xi1 / solution;
  random int Xc2 / subject=ID type = un;
run;  Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) ID 0.6175 UN(2,1) ID 0.3720 UN(2,2) ID 0.2505 Residual 58.4294 Fit Statistics -2 Res Log Likelihood 26635.7 AIC (smaller is better) 26643.7 AICC (smaller is better) 26643.7 BIC (smaller is better) 26654.5 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 3 10.00 0.0186 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 3.5533 0.1775 109 20.02

Again, we notice quite a bit of bias creep in the naive model. Next we will try the Tobit model and see if it does any better.


proc nlmixed data=Simu XTOL=1E-8 method=GAUSS qpoints=20 TECH=TRUREG MAXITER=300;
  parms beta0=3.79 beta1=.20 beta2=.98 beta3=3.36
    sigma2_u1=.75 sigma2_u2=.38 sigma2_u3=.20 sigma2=74;
  bounds sigma2_u1 sigma2_u3 sigma2 >= 0;
  pi = constant('pi');
  mu = beta0 + b_0j + beta1*Xc1 + beta2*Xc2 + beta3*Xi1 + b_1j*Xc2;
  if Ycensr < 16 then ll = (1 / (sqrt(2*pi*sigma2))) * exp( -(Ycensr-mu)**2 / (2*sigma2) ); if Ycensr >= 16 then 
    ll = 1 - probnorm( (Ycensr - mu) / sqrt(sigma2) );
  L=log(ll);
  model Ycensr ~ general(L);
  random b_0j b_1j ~ normal([0, 0], [sigma2_u1, sigma2_u2, sigma2_u3]) subject=ID;
  run;
quit;  The NLMIXED Procedure Specifications Data Set WORK.SIMU Dependent Variable Ycensr Distribution for Dependent Variable General Random Effects b_0j b_1j Distribution for Random Effects Normal Subject Variable ID Optimization Technique Trust Region Integration Method Adaptive Gaussian Quadrature Dimensions Observations Used 3850 Observations Not Used 0 Total Observations 3850 Subjects 110 Max Obs Per Subject 35 Parameters 8 Quadrature Points 20 Parameters sigma2_u1 sigma2_u2 sigma2_u3 sigma2 beta0 beta1 beta2 beta3 NegLogLike 0.75 0.38 0.2 74 3.79 0.2 0.98 3.36 12773.9336 Iteration History Iter Calls NegLogLike Diff MaxGrad Radius 1 23 12773.9286 0.005028 3.271649 -0.00503 2 35 12773.9187 0.009914 3.248692 -0.00991 3 47 12773.899 0.01974 3.201888 -0.01974 4 59 12773.8986 0.000399 3.200929 -0.0004 5 70 12773.8985 0.00008 3.200737 -0.00008 NOTE: GCONV convergence criterion satisfied. Fit Statistics -2 Log Likelihood 25548 AIC (smaller is better) 25564 AICC (smaller is better) 25564 BIC (smaller is better) 25585 Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient sigma2_u1 0.7469 0.4295 108 1.74 0.0849 0.05 -0.1045 1.5982 1.800145 sigma2_u2 0.3856 0.2685 108 1.44 0.1540 0.05 -0.1467 0.9179 -3.20074 sigma2_u3 0.1992 0.3359 108 0.59 0.5544 0.05 -0.4666 0.8650 0.476851 sigma2 73.9972 2.0114 108 36.79

It is no coincidence that a different optimization technique was used in this example. The default quasi Newton-Raphson technique had difficulty converging. Specifically, there were difficulties (negative eigenvalues, not positive definite, etc.) with the Hessian matrix (the square matrix of partial second derivatives which is used, among other things, to estimate standard errors). As optimization becomes more difficult, successful convergence and sensible parameter estimates tend to be increasingly dependent on the starting values. It can be useful to fit an improper but computationally easier model first (e.g., as we did with the naive proc mixed) in order to get some estimates to use as start values. Initial runs can also use less numerically methods, for example you could evaluate the integral on fewer points to speed up computation until you were certain you had good start values and an optimization problem that was likely to converge. Then, the precision of estimation could be increased. Considerable burden is placed on the analyst to ensure sensible results. For example, in this instance, successful convergence was reached with a variety of parameter estimates depending on the start values used. Specifically, there was a trade off between obtaining an estimate of the covariance between random intercepts and slopes that was close to the true value and overestimating the variance of the slopes or accurately estimating the variance of the slopes but underestimating the covariance.

Parameter    True      Naive    Tobit
Intercept    3.9165    3.5533   3.7921
Xc1          0.2049    0.2027   0.2007
Xc2          0.9670    0.8504   0.9778
Xi1          3.3715    3.0254   3.3613
UN(1,1)      0.7221    0.6175   0.7469
UN(2,1)      0.5905    0.3720   0.3856
UN(2,2)      0.2077    0.2505   0.1992
Residual     74.7233   58.4294  73.9972

Left censored with random intercept

\begin{equation} y_{i} = \left\{\begin{array}{rl} y^{*}_{i} &\mbox{if $y^{*}_{i} > y_{l}$} &\mbox{if $y^{*}_{i} \le y_{l}$} \end{array} \right. \end{equation}

Our likelihood function for left censored data is: \begin{equation} \mathcal{L}(\theta) = \left\{ \begin{array}{rl} \left(\frac{1}{\sigma sqrt{2 pi}}\right) e^{\frac{-(y_{ij} – \mu_{ij})^{2}}{2 \sigma^{2}}} &\mbox{ if $y_{ij} > y_{l}$} \phi \left(\frac{y_{ij} – \mu_{ij}}{\sigma} \right) &\mbox{ if $y_{ij} le y_{l}$} \end{array} \right. \end{equation} When (y) is greater than the lower censoring value, the likelihood is just the normal PDF. When (y) is less than or equal to the lower censoring value, in this particular data example, it will be: begin{equation} \mathcal{L}(\theta) = \left\{ \begin{array}{rl} \left(\frac{1}{\sqrt{2 \pi \sigma^{2}}}\right) e^{\frac{-(Ycensl_{ij} – \mu_{ij})^{2}}{2 \sigma^{2}}} &\mbox{ if $Ycensl_{ij} > -5$} \\ \phi \left(\frac{Ycensl_{ij} – \mu_{ij}}{\sqrt{\sigma^{2}}} \right) &\mbox{ if $Ycensl_{ij} \le -5$} \end{array} \right. \end{equation} where: \begin{equation} E(y_{ij}|b_{j}) = \mu_{ij} = \beta_{0} + \beta_{1}Xc1_{ij} + \beta_{2}Xc2_{ij} + \beta_{3}Xi1_{ij} + b_{0j} \end{equation}

The true model is the same as before using Ystar.


proc mixed data = Simu noclprint;
  class ID;
  model Ystar = Xc1 Xc2 Xi1 / solution;
  random int / subject=ID;
run;  The Mixed Procedure Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject Estimate Intercept ID 0.6974 Residual 74.9236 Fit Statistics -2 Res Log Likelihood 27577.9 AIC (smaller is better) 27581.9 AICC (smaller is better) 27581.9 BIC (smaller is better) 27587.3 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 3.9080 0.1987 109 19.67

Next we turn to the left censored variable, Ycensl which is censored at -5. The first model is a naive random intercept model.


proc mixed data = Simu noclprint;
  class ID;
  model Ycensl = Xc1 Xc2 Xi1 / solution;
  random int / subject=ID;
run;  Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject Estimate Intercept ID 0.5853 Residual 54.7300 Fit Statistics -2 Res Log Likelihood 26373.9 AIC (smaller is better) 26377.9 AICC (smaller is better) 26377.9 BIC (smaller is better) 26383.3 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 4.6307 0.1719 109 26.95

As expected, some bias is introduced when using the censored variable. Predictably, the intercept is much higher than the true value. Also, residual variance is lower (because the range of the outcome is decreased). The Tobit model is virtually identical to that for right censored data with a few signs flipped. Note that now the likelihood for censored values is the CDF instead of 1 – CDF as it was for right censored.


proc nlmixed data=Simu XTOL=1E-12 method=GAUSS qpoints=100;
  parms beta0=7.3 beta1=-.21 beta2=.96 beta3=3
    sigma2_u=.7 sigma2=75;
  bounds sigma2_u sigma2 >= 0;
  pi = constant('pi');
  mu = beta0 + b_0j + beta1*Xc1 + beta2*Xc2 + beta3*Xi1;
  if Ycensl > -5 then 
    ll = (1 / (sqrt(2*pi*sigma2))) * exp( -(Ycensl-mu)**2 / (2*sigma2) );
  if Ycensl <= -5 then 
    ll = probnorm( (Ycensl - mu) / sqrt(sigma2) );
  L = log(ll);
  model Ycensl ~ general(L);
  random b_0j ~ normal(0, sigma2_u) subject=ID;
run;
quit;  The NLMIXED Procedure Specifications Data Set WORK.SIMU Dependent Variable Ycensl Distribution for Dependent Variable General Random Effects b_0j Distribution for Random Effects Normal Subject Variable ID Optimization Technique Dual Quasi-Newton Integration Method Adaptive Gaussian Quadrature Dimensions Observations Used 3850 Observations Not Used 0 Total Observations 3850 Subjects 110 Max Obs Per Subject 35 Parameters 6 Quadrature Points 100 Parameters sigma2_u sigma2 beta0 beta1 beta2 beta3 NegLogLike 0.7 75 7.3 -0.21 0.96 3 12833.5413 Iteration History Iter Calls NegLogLike Diff MaxGrad Slope 1 3 12677.0759 156.4654 29.94779 -832.641 2 4 12657.6655 19.4104 7.321399 -29.463 3 5 12653.1398 4.525654 6.40414 -7.93312 4 6 12651.5071 1.632756 4.086106 -2.76029 5 7 12650.4971 1.009996 2.818824 -1.66452 6 9 12648.3248 2.172269 8.042786 -1.29141 7 10 12646.516 1.808874 3.876156 -2.23033 8 11 12644.5578 1.958138 7.400694 -1.68317 9 13 12638.5205 6.037313 2.905391 -2.58085 10 15 12635.814 2.706522 3.11614 -4.61662 11 17 12635.4828 0.331181 1.371521 -1.11478 12 19 12635.3784 0.104405 0.27784 -0.17558 13 21 12635.3708 0.007562 0.043735 -0.01724 14 23 12635.3707 0.000155 0.00229 -0.00033 15 25 12635.3707 7.724E-7 0.000353 -1.67E-6 NOTE: GCONV convergence criterion satisfied. Fit Statistics -2 Log Likelihood 25271 AIC (smaller is better) 25283 AICC (smaller is better) 25283 BIC (smaller is better) 25299 Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient sigma2_u 0.7310 0.3550 109 2.06 0.0418 0.05 0.02752 1.4346 0.000113 sigma2 64.6977 1.6100 109 40.19

And again we can compare the three models and see that across all of the estimates, the Tobit model outperforms the naive model.

Parameter           True     Naive     Tobit
Intercept:          3.9080   4.6307    4.1525
Xc1:                0.2146   0.1227    0.1397
Xc2:                0.9657   0.9018    0.9716
Xi1:                3.3870   3.1094    3.3565
Intercept Variance: 0.6974   0.5853    0.7310
Residual Variance:  74.9236  54.7300   64.6977

Comments

In these examples, it is very clear that accounting for censoring is the best choice. However, in real data, the exact censoring point may not be so clear nor the underlying assumption (that the data are distributed normally with all values below the minimum censored at the minimum) as appropriate. These data were simulated and explicitly censored so they represent a “best case” scenario. Nevertheless, there are plenty of situations where the censoring value may be clearly known, such as the upper bound on a test, or the lower-limit of detection for a biological assay. When this is the case, accounting for the censoring should yield less biased estimates.

Using proc nlmixed allows us to estimate censored models with random effects, but as was apparent from these examples, this can quickly become quite complex. This approach is best suited to a small number of random effects. If you need to model many random effects, a better specification of the covariance matrix that constrains it to be positive definite may help with convergence. Discussion of techniques for doing this are beyond the scope of this page.

For more reading on this topic see:

Giuseppe Bruno, 2004. “Limited dependent panel data models: a comparative analysis of classical and Bayesian inference among econometric packages,” Computing in Economics and Finance 2004 41, Society for Computational Economics.
Skrondal, A., and S. Rabe-Hesketh. 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.
Pendergast, J. F., S. J. Gange, M. A. Newton, M. J. Lindstrom, M. Palta, and M. R. Fisher. 1996. A survey of methods for analyzing clustered binary response data. International Statistical Review 64: 89–118