An Introduction to Categorical Analysis by Alan Agresti Chapter 7: Building and Applying Logit and Log-linear Models

This page has been updated to work with Stata 12.

This chapter makes use of the user written program fitstat program, which is not part of base Stata. Prior to using the fitstat command, it needs to be downloaded by typing search fitstat in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).

Table 7.2 on page 178 using data on usage of alcohol, cigarettes and marijuana.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/drug, clear

*MODEL 1
quietly glm count i.gender#i.race i.alcohol i.smoke i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 1325.1408  df=25

*MODEL 2
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 15.340343  df=16

*MODEL 3
quietly glm count i.gender#i.race#i.alcohol i.gender#i.race#i.smoke ///
          i.gender#i.race#i.marij i.gender#i.alcohol#i.smoke ///
          i.gender#i.alcohol#i.marij i.gender#i.smoke#i.marij ///
          i.race#i.alcohol#i.smoke i.race#i.alcohol#i.marij ///
          i.race#i.smoke#i.marij i.alcohol#i.smoke#i.marij,  fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 5.2720008  df=6
 
*MODEL 4a
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.marij i.smoke#i.marij, fam(poi)
display "deviance = " e(deviance) "  df=" e(df)
deviance = 201.19931  df=17
 
*MODEL 4b
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.smoke#i.marij, fam(poi)
display "deviance = " e(deviance) "  df=" e(df)
deviance = 106.958  df=17

*MODEL 4c
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij, fam(poi)  
display "deviance = " e(deviance) "  df=" e(df)
deviance = 513.47218  df=17

*MODEL 4d
quietly glm count i.gender#i.race i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 18.716951  df=17

*MODEL 4e
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij  i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 20.320861  df=17

*MODEL 4f
quietly glm count i.gender#i.race i.gender#i.alcohol ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi)  
display "deviance = " e(deviance) "  df=" e(df)
deviance = 16.317184  df=17

*MODEL 4g
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 15.783467  df=17

*MODEL 4h 
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.race#i.alcohol i.race#i.smoke i.race#i.marij ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 25.161008  df=17

*MODEL 4i
quietly glm count i.gender#i.race i.gender#i.alcohol i.gender#i.smoke ///
         i.gender#i.marij i.race#i.alcohol i.race#i.smoke ///
         i.alcohol#i.smoke i.alcohol#i.marij i.smoke#i.marij, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 18.928935  df=17

*MODEL 5
quietly glm count i.alcohol#i.smoke i.alcohol#i.marij ///
         i.smoke#i.marij i.alcohol#i.gender i.alcohol#i.race ///
         i.gender#i.marij i.gender#i.race i.marij#i.race, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 16.73504  df=18

*MODEL 6
quietly glm count i.alcohol#i.smoke i.alcohol#i.marij ///
         i.smoke#i.marij i.alcohol#i.gender i.alcohol#i.race ///
         i.gender#i.marij i.gender#i.race, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 19.908587  df=19

*MODEL 7
quietly glm count i.alcohol#i.smoke i.alcohol#i.marij ///
         i.smoke#i.marij i.alcohol#i.gender i.alcohol#i.race ///
         i.gender#i.race, fam(poi) 
display "deviance = " e(deviance) "  df=" e(df)
deviance = 28.80508  df=20

Table 7.3 on page 181.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/sex, clear

glm count i.premar i.birth, fam(poi)

Iteration 0:   log likelihood = -111.95331  
Iteration 1:   log likelihood = -109.17365  
Iteration 2:   log likelihood = -109.16457  
Iteration 3:   log likelihood = -109.16457  

Generalized linear models                          No. of obs      =        16
Optimization     : ML                              Residual df     =         9
                                                   Scale parameter =         1
Deviance         =  127.6529373                    (1/df) Deviance =  14.18366
Pearson          =  128.6835978                    (1/df) Pearson  =  14.29818

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]

                                                   AIC             =  14.52057
Log likelihood   =  -109.164565                    BIC             =  102.6996

------------------------------------------------------------------------------
             |                 OIM
       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      premar |
          2  |  -.9767888   .1216605    -8.03   0.000    -1.215239   -.7383387
          3  |  -.3446024   .0988072    -3.49   0.000     -.538261   -.1509438
          4  |   .5092049   .0805088     6.32   0.000     .3514105    .6669993
             |
       birth |
          2  |   .1885912   .1072271     1.76   0.079      -.02157    .3987523
          3  |   .7118393   .0968283     7.35   0.000     .5220592    .9016194
          4  |   .4565487   .1013576     4.50   0.000     .2578914    .6552061
             |
       _cons |   3.747418   .0962184    38.95   0.000     3.558834    3.936003
------------------------------------------------------------------------------

predict pred
(option mu assumed; predicted mean count)

predict resid, p 
predict h, h 
gen aresid = resid/sqrt(1-h) 
drop h
table premar birth, cont(mean count mean pred mean aresid)

------------------------------------------------------
          |                   birth                   
   premar |         1          2          3          4
----------+-------------------------------------------
        1 |        81         68         60         38
          |  42.41145   51.21382   86.42332    66.9514
          |  7.603182   3.076708  -4.116706   -4.83965
          | 
        2 |        24         26         29         14
          |  15.96868   19.28294   32.53996   25.20842
          |  2.328322   1.811479  -.8114836  -2.756819
          | 
        3 |        18         41         74         42
          |   30.0486    36.2851    61.2311    47.4352
          | -2.681746   .9762287   2.247295  -1.026372
          | 
        4 |        36         57        161        157
          |  70.57127   85.21814   143.8056    111.405
          | -6.063329  -4.603856   2.384558   6.784545
------------------------------------------------------

Table 7.4 on page 184.

glm count i.premar i.birth c.premar#c.birth, fam(poi)

Iteration 0:   log likelihood =  -53.12473  
Iteration 1:   log likelihood = -51.110631  
Iteration 2:   log likelihood = -51.104943  
Iteration 3:   log likelihood = -51.104943  

Generalized linear models                          No. of obs      =        16
Optimization     : ML                              Residual df     =         8
                                                   Scale parameter =         1
Deviance         =  11.53369221                    (1/df) Deviance =  1.441712
Pearson          =   11.5084629                    (1/df) Pearson  =  1.438558

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]

                                                   AIC             =  7.388118
Log likelihood   =  -51.1049425                    BIC             = -10.64702

------------------------------------------------------------------------------
             |                 OIM
       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      premar |
          2  |  -1.645964   .1347333   -12.22   0.000    -1.910037   -1.381892
          3  |  -1.770023   .1646445   -10.75   0.000     -2.09272   -1.447326
          4  |  -1.753685   .2343172    -7.48   0.000    -2.212938   -1.294432
             |
       birth |
          2  |   -.464105   .1195237    -3.88   0.000    -.6983672   -.2298428
          3  |  -.7245224   .1620067    -4.47   0.000     -1.04205    -.406995
          4  |  -1.879664   .2491019    -7.55   0.000    -2.367895   -1.391433
             |
    c.premar#|
     c.birth |   .2858355    .028238    10.12   0.000     .2304901    .3411809
             |
       _cons |   4.106841   .0895105    45.88   0.000     3.931404    4.282279
------------------------------------------------------------------------------

predict pred2
(option mu assumed; predicted mean count)

table  premar birth, cont(mean count mean pred2)

--------------------------------------------------
          |                 birth                 
   premar |        1         2         3         4
----------+---------------------------------------
        1 |       81        68        60        38
          | 80.85658  67.65406  69.39574  29.09363
          | 
        2 |       24        26        29        14
          | 20.75004   23.1065   31.5435  17.59996
          | 
        3 |       18        41        74        42
          |  24.3937  36.15178  65.68137  48.77315
          | 
        4 |       36        57       161       157
          | 32.99969  65.08765  157.3794  155.5333
--------------------------------------------------

Table 7.5 and calculation on page 186.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/cmh, clear

The command tab3way can be downloaded from the internet by typing search tab3way in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).

tab3way income satisf female [fw=count]

Frequency weights are based on the expression: count
Table entries are cell frequencies
Missing categories ignored
------------------------------------------------------------
          |                female and satisf                
          | ---------- M ---------    ---------- F ---------
   income |    1     3     4     5       1     3     4     5
----------+-------------------------------------------------
        3 |    1     1     2     1       1     3    11     2
       10 |          3     5     1       2     3    17     3
       20 |                7     3             1     8     5
       35 |          1     9     6             2     4     2
------------------------------------------------------------

quietly poisson count i.female##i.income i.female##i.satisf 
fitstat, saving(m0)

Measures of Fit for poisson of count

Log-Lik Intercept Only:        -96.116   Log-Lik Full Model:            -48.006
D(18):                          96.012   LR(13):                         96.220
                                         Prob > LR:                       0.000
McFadden's R2:                   0.501   McFadden's Adj R2:               0.355
ML (Cox-Snell) R2:               0.951   Cragg-Uhler(Nagelkerke) R2:      0.953
AIC:                             3.875   AIC*n:                         124.012
BIC:                            33.628   BIC':                          -51.165
BIC used by Stata:             144.532   AIC used by Stata:             124.012
(Indices saved in matrix fs_m0)

quietly poisson count i.income##i.satisf i.female##i.income i.female##i.satisf 
fitstat, using(m0)

Measures of Fit for poisson of count

                               Current             Saved        Difference
Model:                         poisson           poisson
N:                                  32                32                 0
Log-Lik Intercept Only         -96.116           -96.116             0.000
Log-Lik Full Model             -41.868           -48.006             6.137
D                               83.737(9)         96.012(18)        12.275(9)
LR                             108.495(22)        96.220(13)        12.275(9)
Prob > LR                        0.000             0.000             0.198
McFadden's R2                    0.564             0.501             0.064
McFadden's Adj R2                0.325             0.355            -0.030
ML (Cox-Snell) R2                0.966             0.951             0.016
Cragg-Uhler(Nagelkerke) R2       0.969             0.953             0.016
AIC                              4.054             3.875             0.179
AIC*n                          129.737           124.012             5.725
BIC                             52.545            33.628            18.917
BIC'                           -32.248           -51.165            18.917
BIC used by Stata              163.449           144.532            18.917
AIC used by Stata              129.737           124.012             5.725

Difference of   18.917 in BIC' provides very strong support for saved model.

Note: p-value for difference in LR is only valid if models are nested.
.

Section 7.3.4.

recode income (3 = 1)(10 = 2)(20 = 3)(35 = 4), gen(score1)
(32 differences between income and score)

gen score2 = 1
replace score2 = satisf - 1 if satisf ~=1
(24 real changes made)

poisson count i.female##i.income i.female##i.satisf c.score1#c.score2

Iteration 0:   log likelihood =  -45.68759  
Iteration 1:   log likelihood = -44.488147  
Iteration 2:   log likelihood = -44.485004  
Iteration 3:   log likelihood = -44.485002  

Poisson regression                                Number of obs   =         32
                                                  LR chi2(14)     =     103.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -44.485002                       Pseudo R2       =     0.5372

-------------------------------------------------------------------------------
        count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
     1.female |   1.395172   1.208614     1.15   0.248    -.9736688    3.764013
              |
       income |
          10  |  -.5137616   .6911784    -0.74   0.457    -1.868446    .8409231
          20  |  -1.586161   1.030348    -1.54   0.124    -3.605606    .4332843
          35  |  -2.360276   1.476453    -1.60   0.110     -5.25407    .5335184
              |
female#income |
        1 10  |    -.19793   .6440272    -0.31   0.759      -1.4602     1.06434
        1 20  |  -.8761533   .6673139    -1.31   0.189    -2.184064    .4317579
        1 35  |  -1.894792   .6889112    -2.75   0.006    -3.245033   -.5445512
              |
       satisf |
           3  |   .7452832    1.12599     0.66   0.508    -1.461616    2.952182
           4  |   1.233372   1.207738     1.02   0.307    -1.133751    3.600495
           5  |  -.7053711   1.552878    -0.45   0.650    -3.748956    2.338213
              |
female#satisf |
         1 3  |  -.3253637    1.28563    -0.25   0.800    -2.845153    2.194425
         1 4  |  -.1124099   1.201721    -0.09   0.925    -2.467741    2.242921
         1 5  |  -.3043782   1.271368    -0.24   0.811    -2.796214    2.187458
              |
     c.score1#|
     c.score2 |   .3878149   .1547002     2.51   0.012     .0846081    .6910216
              |
        _cons |  -1.354202   1.054484    -1.28   0.199    -3.420953    .7125493
-------------------------------------------------------------------------------


fitstat, using(m0)

Measures of Fit for poisson of count

                               Current             Saved        Difference
Model:                         poisson           poisson
N:                                  32                32                 0
Log-Lik Intercept Only         -96.116           -96.116             0.000
Log-Lik Full Model             -44.485           -48.006             3.521
D                               88.970(17)        96.012(18)         7.042(1)
LR                             103.261(14)        96.220(13)         7.042(1)
Prob > LR                        0.000             0.000             0.008
McFadden's R2                    0.537             0.501             0.037
McFadden's Adj R2                0.381             0.355             0.026
ML (Cox-Snell) R2                0.960             0.951             0.010
Cragg-Uhler(Nagelkerke) R2       0.963             0.953             0.010
AIC                              3.718             3.875            -0.158
AIC*n                          118.970           124.012            -5.042
BIC                             30.052            33.628            -3.576
BIC'                           -54.741           -51.165            -3.576
BIC used by Stata              140.956           144.532            -3.576
AIC used by Stata              118.970           124.012            -5.042

Difference of    3.576 in BIC' provides positive support for current model.

Note: p-value for difference in LR is only valid if models are nested.

test c.score1#c.score2

( 1)  [count]c.score1#c.score2 = 0

           chi2(  1) =    6.28
         Prob > chi2 =    0.0122