Sometimes, we need to generate a saturated model. In Stata, this can be done easily using the program desmat, written by John Hendrickx. The command needs to be downloaded before we use it and can be obtained by typing search dm73_3 in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).
Here is an example using a data set on belief in afterlife from An Introduction To Categorical Analysis by Argresti. There are three categorical variables in the data set.
use https://stats.idre.ucla.edu/stat/stata/faq/afterlife, clear list race gender belief count 1. 1 1 1 371 2. 1 1 2 49 3. 1 1 3 74 4. 1 0 1 250 5. 1 0 2 45 6. 1 0 3 71 7. 0 1 1 64 8. 0 1 2 9 9. 0 1 3 15 10. 0 0 1 25 11. 0 0 2 5 12. 0 0 3 13
To generate a saturated model, we can simply do the following. The three predictors grouped with "*" indicate that we want all the main effects, 2-way interactions and the 3-way interaction.
desmat: poisson count race*gender*belief ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 12 Initial log likelihood: -665.927 Log likelihood: -33.156 LR chi square: 1265.541 Model degrees of freedom: 11 Pseudo R-squared: 0.950 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count race 1 1 2.303** 0.210 gender 2 1 0.940** 0.236 race.gender 3 1.1 -0.545* 0.250 belief 4 2 -1.609** 0.490 5 3 -0.654 0.342 race.belief 6 1.2 -0.105 0.516 7 1.3 -0.605 0.367 gender.belief 8 1.2 -0.352 0.606 9 1.3 -0.797 0.446 race.gender.belief 10 1.1.2 0.043 0.645 11 1.1.3 0.444 0.483 12 _cons 3.219** 0.200 ------------------------------------------------------------------------------- * p < .05 ** p < .01
A set of dummy variables are generated by the program, and they are named as _x_1, _x_2, etc. To see what they are parameterized for, we can type
showtrms Desmat generated the following design matrix: nr Variables Term Parameterization First Last 1 _x_1 race ind(0) 2 _x_2 gender ind(0) 3 _x_3 race.gender ind(0).ind(0) 4 _x_4 _x_5 belief ind(1) 5 _x_6 _x_7 race.belief ind(0).ind(1) 6 _x_8 _x_9 gender.belief ind(0).ind(1) 7 _x_10 _x_11 race.gender.belief ind(0).ind(0).ind(1)
There are a few options for desmat. For example, we can use desrep to display the full result of a model.
desmat: poisson count race*gender*belief, desrep(exp all) ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 12 Initial log likelihood: -665.927 Log likelihood: -33.156 LR chi square: 1265.541 Model degrees of freedom: 11 Pseudo R-squared: 0.950 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. z prob lo 95% hi 95% (exponential parameters) ------------------------------------------------------------------------------- count race 1 1 10.000** 2.098 10.977 0.000 6.629 15.085 gender 2 1 2.560** 0.604 3.986 0.000 1.612 4.064 race.gender 3 1.1 0.580* 0.145 -2.184 0.029 0.355 0.946 belief 4 2 0.200** 0.098 -3.285 0.001 0.077 0.522 5 3 0.520 0.178 -1.912 0.056 0.266 1.016 race.belief 6 1.2 0.900 0.464 -0.204 0.838 0.327 2.474 7 1.3 0.546 0.201 -1.646 0.100 0.266 1.122 gender.belief 8 1.2 0.703 0.426 -0.582 0.561 0.215 2.304 9 1.3 0.451 0.201 -1.785 0.074 0.188 1.081 race.gender.belief 10 1.1.2 1.044 0.673 0.066 0.947 0.295 3.695 11 1.1.3 1.558 0.753 0.918 0.359 0.604 4.017 12 _cons 25.000** 5.000 16.094 0.000 16.893 36.998 ------------------------------------------------------------------------------- * p < .05 ** p < .01
One thing that one often wants to do after running a saturated model is to compare it with other models. We can issue the command lrtest to save the likelihood ratio for the saturated model after the saturated model is created. Then we run other smaller models and do the lrtest again using the saved information to compare models.
lrtest, saving(m0) desmat: poisson count race belief*gender, desrep(exp all) ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 12 Initial log likelihood: -665.927 Log likelihood: -36.852 LR chi square: 1258.149 Model degrees of freedom: 6 Pseudo R-squared: 0.945 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. z prob lo 95% hi 95% (exponential parameters) ------------------------------------------------------------------------------- count race 1 1 6.565** 0.616 20.063 0.000 5.463 7.890 belief 2 2 0.182** 0.028 -11.088 0.000 0.135 0.246 3 3 0.305** 0.038 -9.513 0.000 0.239 0.390 gender 4 1 1.582** 0.122 5.952 0.000 1.360 1.840 belief.gender 5 2.1 0.733 0.152 -1.493 0.136 0.488 1.102 6 3.1 0.670* 0.114 -2.350 0.019 0.480 0.936 7 _cons 36.352** 3.682 35.473 0.000 29.806 44.336 ------------------------------------------------------------------------------- * p < .05 ** p < .01 lrtest, using(m0) Poisson: likelihood-ratio test chi2(5) = 7.39 Prob > chi2 = 0.1931
Another command that comes with desmat is destest. It performs a Wald test on model terms after a model has been created.
destest Testing all model terms ... ------------------------------------------------------------------------------- Term Wald chi2 df P > chi2 ------------------------------------------------------------------------------- race 402.544** 1 0.000 belief 179.902** 2 0.000 gender 35.431** 1 0.000 belief.gender 6.766* 2 0.034 ------------------------------------------------------------------------------- * p < .05 ** p < .01
For more information, please do help desmat or visit the webpage on DESMAT for Stata.