How to generate a saturated model using DESMAT?

Sometimes, we need to generate a saturated model. In Stata, this can be done easily using the program desmat, written by John Hendrickx. The command needs to be downloaded before we use it and can be obtained by typing search dm73_3 in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).

Here is an example using a data set on belief in afterlife from An Introduction To Categorical Analysis by Argresti. There are three categorical variables in the data set.

use https://stats.idre.ucla.edu/stat/stata/faq/afterlife, clear

list 
	      race     gender     belief      count
  1.         1          1          1        371
  2.         1          1          2         49
  3.         1          1          3         74
  4.         1          0          1        250
  5.         1          0          2         45
  6.         1          0          3         71
  7.         0          1          1         64
  8.         0          1          2          9
  9.         0          1          3         15
 10.         0          0          1         25
 11.         0          0          2          5
 12.         0          0          3         13

To generate a saturated model, we can simply do the following. The three predictors grouped with "*" indicate that we want all the main effects, 2-way interactions and the 3-way interaction.

desmat: poisson count race*gender*belief

-------------------------------------------------------------------------------
   poisson
-------------------------------------------------------------------------------
   Dependent variable                                                     count
   Number of observations:                                                   12
   Initial log likelihood:                                             -665.927
   Log likelihood:                                                      -33.156
   LR chi square:                                                      1265.541
   Model degrees of freedom:                                                 11
   Pseudo R-squared:                                                      0.950
   Prob:                                                                  0.000
-------------------------------------------------------------------------------
nr Effect                                                     Coeff        s.e.
-------------------------------------------------------------------------------
   count
     race
1      1                                                      2.303**     0.210
     gender
2      1                                                      0.940**     0.236
     race.gender
3      1.1                                                   -0.545*      0.250
     belief
4      2                                                     -1.609**     0.490
5      3                                                     -0.654       0.342
     race.belief
6      1.2                                                   -0.105       0.516
7      1.3                                                   -0.605       0.367
     gender.belief
8      1.2                                                   -0.352       0.606
9      1.3                                                   -0.797       0.446
     race.gender.belief
10     1.1.2                                                  0.043       0.645
11     1.1.3                                                  0.444       0.483
12   _cons                                                    3.219**     0.200
-------------------------------------------------------------------------------
*  p < .05
** p < .01

A set of dummy variables are generated by the program, and they are named as _x_1, _x_2, etc. To see what they are parameterized for, we can type

showtrms

Desmat generated the following design matrix:

nr   Variables       Term                        Parameterization
     First    Last
 1    _x_1           race                        ind(0)
 2    _x_2           gender                      ind(0)
 3    _x_3           race.gender                 ind(0).ind(0)
 4    _x_4    _x_5   belief                      ind(1)
 5    _x_6    _x_7   race.belief                 ind(0).ind(1)
 6    _x_8    _x_9   gender.belief               ind(0).ind(1)
 7   _x_10   _x_11   race.gender.belief          ind(0).ind(0).ind(1)

There are a few options for desmat. For example, we can use desrep to display the full result of a model.

desmat: poisson count race*gender*belief, desrep(exp all)

-------------------------------------------------------------------------------
   poisson
-------------------------------------------------------------------------------
   Dependent variable                                                     count
   Number of observations:                                                   12
   Initial log likelihood:                                             -665.927
   Log likelihood:                                                      -33.156
   LR chi square:                                                      1265.541
   Model degrees of freedom:                                                 11
   Pseudo R-squared:                                                      0.950
   Prob:                                                                  0.000
-------------------------------------------------------------------------------
nr Effect             Coeff        s.e.       z        prob    lo 95%    hi 95%
(exponential parameters)
-------------------------------------------------------------------------------
   count
     race
1      1             10.000**     2.098    10.977     0.000     6.629    15.085
     gender
2      1              2.560**     0.604     3.986     0.000     1.612     4.064
     race.gender
3      1.1            0.580*      0.145    -2.184     0.029     0.355     0.946
     belief
4      2              0.200**     0.098    -3.285     0.001     0.077     0.522
5      3              0.520       0.178    -1.912     0.056     0.266     1.016
     race.belief
6      1.2            0.900       0.464    -0.204     0.838     0.327     2.474
7      1.3            0.546       0.201    -1.646     0.100     0.266     1.122
     gender.belief
8      1.2            0.703       0.426    -0.582     0.561     0.215     2.304
9      1.3            0.451       0.201    -1.785     0.074     0.188     1.081
     race.gender.belief
10     1.1.2          1.044       0.673     0.066     0.947     0.295     3.695
11     1.1.3          1.558       0.753     0.918     0.359     0.604     4.017
12   _cons           25.000**     5.000    16.094     0.000    16.893    36.998
-------------------------------------------------------------------------------
*  p < .05
** p < .01

One thing that one often wants to do after running a saturated model is to compare it with other models. We can issue the command lrtest to save the likelihood ratio for the saturated model after the saturated model is created. Then we run other smaller models and do the lrtest again using the saved information to compare models.

lrtest, saving(m0)
desmat: poisson count race belief*gender, desrep(exp all)

-------------------------------------------------------------------------------
   poisson
-------------------------------------------------------------------------------
   Dependent variable                                                     count
   Number of observations:                                                   12
   Initial log likelihood:                                             -665.927
   Log likelihood:                                                      -36.852
   LR chi square:                                                      1258.149
   Model degrees of freedom:                                                  6
   Pseudo R-squared:                                                      0.945
   Prob:                                                                  0.000
-------------------------------------------------------------------------------
nr Effect             Coeff        s.e.       z        prob    lo 95%    hi 95%
(exponential parameters)
-------------------------------------------------------------------------------
   count
     race
1      1              6.565**     0.616    20.063     0.000     5.463     7.890
     belief
2      2              0.182**     0.028   -11.088     0.000     0.135     0.246
3      3              0.305**     0.038    -9.513     0.000     0.239     0.390
     gender
4      1              1.582**     0.122     5.952     0.000     1.360     1.840
     belief.gender
5      2.1            0.733       0.152    -1.493     0.136     0.488     1.102
6      3.1            0.670*      0.114    -2.350     0.019     0.480     0.936
7    _cons           36.352**     3.682    35.473     0.000    29.806    44.336
-------------------------------------------------------------------------------
*  p < .05
** p < .01

lrtest, using(m0)

Poisson:  likelihood-ratio test                       chi2(5)     =       7.39
                                                      Prob > chi2 =     0.1931

Another command that comes with desmat is destest. It performs a Wald test on model terms after a model has been created.

destest

Testing all model terms ...
-------------------------------------------------------------------------------
Term                                                Wald chi2      df  P > chi2
-------------------------------------------------------------------------------
race                                                  402.544**     1     0.000
belief                                                179.902**     2     0.000
gender                                                 35.431**     1     0.000
belief.gender                                           6.766*      2     0.034
-------------------------------------------------------------------------------
*  p < .05
** p < .01

For more information, please do help desmat or visit the webpage on DESMAT for Stata.