An Introduction to Categorical Analysis by Alan Agresti Chapter 6: Loglinear Models for Contingency Tables

This unit makes extensive use of the ipf (iterated proportional fitting) command written by Adrian Mander. Use search ipf in Stata to locate the command (see How can I use the search command to search for programs and get additional help? for more information about using search). We will use the glm command with the pois family to obtain coefficients.

Table 6.1, page 147.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear

list

     gender   aftlife      freq
1.  females       yes       435
2.  females        no       147
3.    males       yes       375
4.    males        no       134

table gender aftlife [fw=freq], cont(freq)

----------------------
          | belief in 
          | afterlife 
   gender |   no   yes
----------+-----------
     male |  134   375
  females |  147   435
----------------------

ipf [fw=freq], fit(gender+aftlife) save(aftlif) exp nolog

Deleting all matrices......

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  gender 
marginal model 2 varlist :  aftlife 
unique varlist  gender aftlife

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1  

Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 =   0.1620 p-value = 0.687
Pearson Statistic          X^2 =   0.1621 p-value = 0.687

   gender    aftlife       Efreq       Ofreq        prob
        0          0   131.09899         134   .12016406
        0          1   377.90101         375   .34638039
        1          0   149.90101         147   .13739781
        1          1   432.09899         435   .39605774

use aftlif, clear

table gender aftlife, cont(mean Efreq)

--------------------------------
          |       aftlife       
   gender |         0          1
----------+---------------------
        0 | 131.09899  377.90101
        1 | 149.90101  432.09899
--------------------------------

generate lefreq = ln(Efreq)
table gender aftlife, cont(mean lefreq)

------------------------------
          |      aftlife      
   gender |        0         1
----------+-------------------
        0 | 4.875953  5.934632
        1 | 5.009975  6.068655
------------------------------

use https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear

glm freq gender aftlife, fam(pois) link(log)

Generalized linear models                          No. of obs      =         4
Optimization     : ML: Newton-Raphson              Residual df     =         1
                                                   Scale param     =         1
Deviance         =  .1619951194                    (1/df) Deviance =  .1619951
Pearson          =   .162083973                    (1/df) Pearson  =   .162084

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -14.70362649                    AIC             =  8.851813
BIC              = -3.996887964

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gender |   .1340224   .0606865     2.21   0.027     .0150791    .2529658
     aftlife |    1.05868   .0692336    15.29   0.000     .9229843    1.194375
       _cons |   4.875953   .0678732    71.84   0.000     4.742924    5.008982
------------------------------------------------------------------------------

generate g2 = ~gender
generate a2 = ~aftlife
glm freq g2 a2, fam(pois) link(log)

Generalized linear models                          No. of obs      =         4
Optimization     : ML: Newton-Raphson              Residual df     =         1
                                                   Scale param     =         1
Deviance         =  .1619951194                    (1/df) Deviance =  .1619951
Pearson          =   .162083973                    (1/df) Pearson  =   .162084

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -14.70362649                    AIC             =  8.851813
BIC              = -3.996887964

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g2 |  -.1340224   .0606865    -2.21   0.027    -.2529658   -.0150791
          a2 |   -1.05868   .0692336   -15.29   0.000    -1.194375   -.9229843
       _cons |   6.068655   .0451242   134.49   0.000     5.980213    6.157096
------------------------------------------------------------------------------

generate g3 = gender - g2
generate a3 = aftlife - a2
list

       gender   aftlife      freq         g2         a2         g3         a3
  1.  females       yes       435          0          0          1          1
  2.  females        no       147          0          1          1         -1
  3.     male       yes       375          1          0         -1          1
  4.     male        no       134          1          1         -1         -1

glm freq g3 a3, fam(pois) link(log) 

Generalized linear models                          No. of obs      =         4
Optimization     : ML: Newton-Raphson              Residual df     =         1
                                                   Scale param     =         1
Deviance         =  .1619951194                    (1/df) Deviance =  .1619951
Pearson          =   .162083973                    (1/df) Pearson  =   .162084

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -14.70362649                    AIC             =  8.851813
BIC              = -3.996887964

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g3 |   .0670112   .0303432     2.21   0.027     .0075396    .1264829
          a3 |   .5293398   .0346168    15.29   0.000     .4614921    .5971874
       _cons |   5.472304   .0346763   157.81   0.000     5.404339    5.540268
------------------------------------------------------------------------------

Table 6.3, page 152.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/acm, clear

describe

Contains data from acm.dta
  obs:             8                          
 vars:             4                          28 Nov 2001 14:28
 size:            72 (99.7% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
a               byte   %8.0g       yn         alcohol use
c               byte   %8.0g       yn         cigarette use
m               byte   %8.0g       yn         marijuana use
freq            int    %8.0g                  
-------------------------------------------------------------------------------

list
            a         c         m      freq
  1.      yes        no       yes        44
  2.       no        no       yes         2
  3.       no       yes       yes         3
  4.      yes       yes       yes       911
  5.       no        no        no       279
  6.       no       yes        no        43
  7.      yes        no        no       456
  8.      yes       yes        no       538
  
table c m [fw=freq], by(a)

----------------------
alcohol   |
use and   | marijuana 
cigarette |    use    
use       |  yes    no
----------+-----------
yes       |
      yes |  911   538
       no |   44   456
----------+-----------
no        |
      yes |    3    43
       no |    2   279
----------------------

Table 6.4, page 152, output edited.

/* (A, C, M) */
ipf [fw=freq], fit(a+c+m) exp

        a          c          m       Efreq       
        1          1          1   539.98258      
        1          1          2   740.22612        
        1          2          1   282.09123        
        1          2          2   386.70007       
        2          1          1   90.597385         
        2          1          2   124.19392        
        2          2          1   47.328801         
        2          2          2   64.879898        

/* (AC, M) */
ipf [fw=freq], fit(a*c+m) exp

        a          c          m       Efreq       
        1          1          1    611.1775      
        1          1          2    837.8225    
        1          2          1   210.89631       
        1          2          2   289.10369       
        2          1          1    19.40246         
        2          1          2    26.59754       
        2          2          1   118.52373         
        2          2          2   162.47627        

/* (AM, CM) */
ipf [fw=freq], fit(a*m+c*m) exp

        a          m          c       Efreq       
        1          1          1   909.23958        
        1          1          2   45.760417        
        1          2          1   438.84043        
        1          2          2   555.15957         
        2          1          1   4.7604167         
        2          1          2   .23958333          
        2          2          1   142.15957        
        2          2          2   179.84043

/* (AC, AM, CM) */
ipf [fw=freq], fit(a*c+a*m+c*m) exp

        a          c          m       Efreq      
        1          1          1   910.38316     
        1          1          2   538.61683     
        1          2          1   44.616829       
        1          2          2   455.38327        
        2          1          1   3.6168352         
        2          1          2   42.383171         
        2          2          1   1.3831706         
        2          2          2   279.61673

/* (ACM) */
ipf [fw=freq], fit(a*c*m) exp

        a          c          m       Efreq      
        1          1          1         911       
        1          1          2         538        
        1          2          1          44        
        1          2          2         456         
        2          1          1           3          
        2          1          2          43          
        2          2          1           2
        2          2          2         279

Table 6.6, page 155, output edited.

ipf [fw=freq], fit(a+c+m)

df = 4
Likelihood Ratio Statistic G^2 = 1286.0199 p-value = 0.000
Pearson Statistic          X^2 = 1411.3860 p-value = 0.000

ipf [fw=freq], fit(a+c*m)

df = 3
Likelihood Ratio Statistic G^2 = 534.2117 p-value = 0.000
Pearson Statistic          X^2 = 505.5977 p-value = 0.000

ipf [fw=freq], fit(c+a*m)

df = 3
Likelihood Ratio Statistic G^2 = 939.5626 p-value = 0.000
Pearson Statistic          X^2 = 824.1630 p-value = 0.000

ipf [fw=freq], fit(m+a*c)

df = 3
Likelihood Ratio Statistic G^2 = 843.8267 p-value = 0.000
Pearson Statistic          X^2 = 704.9071 p-value = 0.000


ipf [fw=freq], fit(a*c+a*m)

df = 2
Likelihood Ratio Statistic G^2 = 497.3693 p-value = 0.000
Pearson Statistic          X^2 = 443.7611 p-value = 0.000

ipf [fw=freq], fit(a*c+c*m)

df = 2
Likelihood Ratio Statistic G^2 =  92.0184 p-value = 0.000
Pearson Statistic          X^2 =  80.8148 p-value = 0.000

ipf [fw=freq], fit(a*m+c*m)

df = 2
Likelihood Ratio Statistic G^2 = 187.7543 p-value = 0.000
Pearson Statistic          X^2 = 177.6149 p-value = 0.000

ipf [fw=freq], fit(a*c+a*m+c*m)

Likelihood Ratio Statistic G^2 =   0.3740 p-value = 0.541
Pearson Statistic          X^2 =   0.4011 p-value = 0.527

ipf [fw=freq], fit(a*c*m)

df = 0
Likelihood Ratio Statistic G^2 =   0.0000 p-value =     .
Pearson Statistic          X^2 =   0.0000 p-value =     .

Table 6.7, page 156.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/acm, clear

gen ac=a*c
gen am=a*m
gen cm=c*m

glm freq a c m am cm, fam(poi)

Iteration 0:   log likelihood = -306.78871  
Iteration 1:   log likelihood = -134.68656  
Iteration 2:   log likelihood = -119.80666  
Iteration 3:   log likelihood = -118.41883  
Iteration 4:   log likelihood = -118.39888  
Iteration 5:   log likelihood = -118.39887  
Iteration 6:   log likelihood = -118.39887  

Generalized linear models                          No. of obs      =         8
Optimization     : ML: Newton-Raphson              Residual df     =         2
                                                   Scale parameter =         1
Deviance         =  187.7543029                    (1/df) Deviance =  93.87715
Pearson          =  177.6148606                    (1/df) Pearson  =  88.80743

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -118.3988656                    AIC             =  31.09972
BIC              =  183.5954198

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |  -9.377361   .8990551   -10.43   0.000    -11.13948   -7.615246
           c |  -6.213498   .3072696   -20.22   0.000    -6.815735   -5.611261
           m |  -8.077869   .4938394   -16.36   0.000    -9.045777   -7.109962
          am |   4.125088   .4529445     9.11   0.000     3.237333    5.012843
          cm |   3.224309   .1609812    20.03   0.000     2.908792    3.539826
       _cons |   23.13194   .9652276    23.97   0.000     21.24013    25.02375
------------------------------------------------------------------------------

predict fit1
(option mu assumed; predicted mean freq)

predict h1, h
predict res1, p
gen ares1 = res1/sqrt(1-h1)

glm freq a c m ac am cm, fam(poi)

Iteration 0:   log likelihood = -142.34193  
Iteration 1:   log likelihood = -37.961044  
Iteration 2:   log likelihood = -25.867183  
Iteration 3:   log likelihood = -24.719804  
Iteration 4:   log likelihood = -24.708713  
Iteration 5:   log likelihood = -24.708707  
Iteration 6:   log likelihood = -24.708707  

Generalized linear models                          No. of obs      =         8
Optimization     : ML: Newton-Raphson              Residual df     =         1
                                                   Scale parameter =         1
Deviance         =  .3739858701                    (1/df) Deviance =  .3739859
Pearson          =  .4011005168                    (1/df) Pearson  =  .4011005

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -24.70870712                    AIC             =  7.927177
BIC              = -1.705455672

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |  -10.56882   .9109278   -11.60   0.000     -12.3542    -8.78343
           c |  -7.918178   .3476245   -22.78   0.000    -8.599509   -7.236846
           m |  -6.358765   .4957275   -12.83   0.000    -7.330373   -5.387157
          ac |   2.054534   .1740643    11.80   0.000     1.713374    2.395694
          am |   2.986014    .464678     6.43   0.000     2.075262    3.896767
          cm |   2.847889   .1638394    17.38   0.000      2.52677    3.169009
       _cons |   23.77119   .9484083    25.06   0.000     21.91234    25.63003
------------------------------------------------------------------------------

predict fit2
(option mu assumed; predicted mean freq)

predict h2, h
predict res2, p
gen ares2 = res2/sqrt(1-h2)

list a c m freq  fit1 fit2 ares1 ares2

     +----------------------------------------------------------------------+
     |   a     c     m   freq       fit1       fit2       ares1       ares2 |
     |----------------------------------------------------------------------|
  1. |  no    no   yes      2   .2395833    1.38317    3.695589    .6333249 |
  2. |  no   yes   yes      3   4.760417    3.61683   -3.695589   -.6333249 |
  3. |  no   yes    no     43   142.1596   42.38317   -12.80459    .6333254 |
  4. | yes    no   yes     44   45.76042   44.61683   -3.695596   -.6333249 |
  5. |  no    no    no    279   179.8404   279.6168    12.80459   -.6333253 |
     |----------------------------------------------------------------------|
  6. | yes    no    no    456   555.1595   455.3832   -12.80459    .6333241 |
  7. | yes   yes    no    538   438.8404   538.6168    12.80459   -.6333285 |
  8. | yes   yes   yes    911   909.2396   910.3832    3.695599    .6333305 |
     +----------------------------------------------------------------------+

Table 6.8, page 159.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear

describe

Contains data from injury.dta
  obs:            16                          
 vars:             5                          29 Nov 2001 08:11
 size:           160 (100.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
g               byte   %8.0g       gen        gender
l               byte   %8.0g       loc        location
s               byte   %8.0g       yn         seat-belt
j               byte   %8.0g       yn         injury
freq            int    %8.0g                  
-------------------------------------------------------------------------------

list

            g         l         s         j      freq
  1.   female     urban        no        no      7287
  2.   female     urban        no       yes       996
  3.   female     urban       yes        no     11587
  4.   female     urban       yes       yes       759
  5.   female     rural        no        no      3246
  6.   female     rural        no       yes       973
  7.   female     rural       yes        no      6134
  8.   female     rural       yes       yes       757
  9.     male     urban        no        no     10381
 10.     male     urban        no       yes       812
 11.     male     urban       yes        no     10969
 12.     male     urban       yes       yes       380
 13.     male     rural        no        no      6123
 14.     male     rural        no       yes      1084
 15.     male     rural       yes        no      6693
 16.     male     rural       yes       yes       513
 
table s j [fw=freq], by(g l)

--------------------------
gender,   |
location  |
and       |     injury    
seat-belt |     no     yes
----------+---------------
female    |
urban     |
       no |  7,287     996
      yes | 11,587     759
----------+---------------
female    |
rural     |
       no |  3,246     973
      yes |  6,134     757
----------+---------------
male      |
urban     |
       no | 10,381     812
      yes | 10,969     380
----------+---------------
male      |
rural     |
       no |  6,123   1,084
      yes |  6,693     513
--------------------------

ipf [fw=freq], fit(g*j+g*l+g*s+j*l+j*s+l*s) exp save(inj2)

Deleting all matrices......

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  g j 
marginal model 2 varlist :  g l 
marginal model 3 varlist :  g s 
marginal model 4 varlist :  j l 
marginal model 5 varlist :  j s 
marginal model 6 varlist :  l s 
unique varlist  g j l s

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 13  

Goodness of Fit Tests
---------------------
df = 13
Likelihood Ratio Statistic G^2 =  23.3510 p-value = 0.038
Pearson Statistic          X^2 =  23.3752 p-value = 0.037

        g          j          l          s       Efreq       Ofreq        prob
        1          1          1          1   7166.3695        7287   .10432308
        1          1          1          2   11748.308       11587   .17102379
        1          1          2          1   3353.8303        3246   .04882275
        1          1          2          2   5985.4936        6134    .0871327
        1          2          1          1   993.01641         996   .01445565
        1          2          1          2   721.30528         759   .01050027
        1          2          2          1   988.78428         973   .01439404
        1          2          2          2   781.89238         757   .01138225
        2          1          1          1   10471.495       10381   .15243682
        2          1          1          2   10837.827       10969   .15776963
        2          1          2          1   6045.3055        6123    .0880034
        2          1          2          2   6811.3709        6693   .09915525
        2          2          1          1   845.11924         812   .01230266
        2          2          1          2   387.55922         380   .00564182
        2          2          2          1   1038.0799        1084   .01511165
        2          2          2          2    518.2432         513   .00754423
        
use inj2, clear

table s j, by(g l) cont(mean Efreq)

--------------------------------
g, l and  |          j          
s         |         1          2
----------+---------------------
1         |
1         |
        1 | 7166.3695  993.01641
        2 | 11748.308  721.30528
----------+---------------------
1         |
2         |
        1 | 3353.8303  988.78428
        2 | 5985.4936  781.89238
----------+---------------------
2         |
1         |
        1 | 10471.495  845.11924
        2 | 10837.827  387.55922
----------+---------------------
2         |
2         |
        1 | 6045.3055  1038.0799
        2 | 6811.3709   518.2432
--------------------------------

use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear

ipf [fw=freq], fit(g*l*s+g*j+j*l+j*s) exp save(inj3)

Deleting all matrices......

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  g l s 
marginal model 2 varlist :  g j 
marginal model 3 varlist :  j l 
marginal model 4 varlist :  j s 
unique varlist  g l s j

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 12  

Goodness of Fit Tests
---------------------
df = 12
Likelihood Ratio Statistic G^2 =   7.4645 p-value = 0.825
Pearson Statistic          X^2 =   7.4874 p-value = 0.824

        g          l          s          j       Efreq       Ofreq        prob
        1          1          1          1   7273.2141        7287   .10587845
        1          1          1          2   1009.7858         996   .01469977
        1          1          2          1   11632.621       11587   .16933969
        1          1          2          2   713.37784         759   .01038486
        1          2          1          1   3254.6633        3246   .04737915
        1          2          1          2    964.3383         973   .01403817
        1          2          2          1    6093.502        6134   .08870501
        1          2          2          2   797.49773         757   .01160942
        2          1          1          1   10358.931       10381   .15079819
        2          1          1          2   834.06847         812    .0121418
        2          1          2          1   10959.234       10969   .15953699
        2          1          2          2   389.76793         380   .00567397
        2          2          1          1   6150.1915        6123   .08953026
        2          2          1          2   1056.8074        1084   .01538428
        2          2          2          1   6697.6432        6693   .09749968
        2          2          2          2    508.3565         513    .0074003

use inj3

table s j, by(g l) cont(mean Efreq)

--------------------------------
g, l and  |          j          
s         |         1          2
----------+---------------------
1         |
1         |
        1 | 7273.2141  1009.7858
        2 | 11632.621  713.37784
----------+---------------------
1         |
2         |
        1 | 3254.6633   964.3383
        2 |  6093.502  797.49773
----------+---------------------
2         |
1         |
        1 | 10358.931  834.06847
        2 | 10959.234  389.76793
----------+---------------------
2         |
2         |
        1 | 6150.1915  1056.8074
        2 | 6697.6432   508.3565
--------------------------------

Table 6.9, page 160, output edited.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear

ipf [fw=freq], fit(g+j+l+s) 

df = 11
Likelihood Ratio Statistic G^2 = 2792.7710 p-value = 0.000
Pearson Statistic          X^2 = 2758.3408 p-value = 0.000

ipf [fw=freq], fit(g*j+g*l+g*s+j*l+j*s+l*s) 

df = 13
Likelihood Ratio Statistic G^2 =  23.3510 p-value = 0.038
Pearson Statistic          X^2 =  23.3752 p-value = 0.037

ipf [fw=freq], fit(g*j*l+g*j*s+g*l*s+j*l*s) 

df = 7
Likelihood Ratio Statistic G^2 =   1.3253 p-value = 0.988
Pearson Statistic          X^2 =   1.3246 p-value = 0.988

ipf [fw=freq], fit(g*j*l+g*s+j*s+l*s) 

df = 10
Likelihood Ratio Statistic G^2 =  18.5693 p-value = 0.046
Pearson Statistic          X^2 =  18.5391 p-value = 0.047

ipf [fw=freq], fit(g*j*s+g*l+j*l+l*s)

df = 10
Likelihood Ratio Statistic G^2 =  22.8468 p-value = 0.011
Pearson Statistic          X^2 =  22.8250 p-value = 0.011

ipf [fw=freq], fit(g*l*s+g*j+j*l+j*s) 

df = 12
Likelihood Ratio Statistic G^2 =   7.4645 p-value = 0.825
Pearson Statistic          X^2 =   7.4874 p-value = 0.824

ipf [fw=freq], fit(j*l*s+g*j+g*l+g*s) 

df = 10
Likelihood Ratio Statistic G^2 =  20.6334 p-value = 0.024
Pearson Statistic          X^2 =  20.6131 p-value = 0.024