An Introduction to Categorical Analysis by Alan Agresti Chapter 6 – Log-linear Models for Contingency Tables

Afterlife data, table 6.1, p. 147.

data after;
  input female belief f1 b1 f2 b2 count;
cards;
  1  1   0  0  1   1 435
  1  0   0  1  1  -1 147
  0  1   1  0 -1   1 375
  0  0   1  1 -1  -1 134
;
run;

Table 6.1, p. 147.

Note: The parameter estimates from the first model corresponds to the parameters in Set 1, the second model to set 2 and the third model to set 3.

proc genmod data=after;
  model count=belief female/ d=poisson ;
  output out=temp p=predict;
run;
proc genmod data=after desc;
  model count=f1 b1/ d=poisson ;
    output out=temp p=predict;
run;
proc genmod data=after desc;
  model count=b2 f2/ d=poisson ;
run;

The GENMOD Procedure

Model Information Data Set WORK.AFTER Distribution Poisson Link Function Log Dependent Variable count Observations Used 4 Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF Deviance 1 0.1620 0.1620 Scaled Deviance 1 0.1620 0.1620 Pearson Chi-Square 1 0.1621 0.1621 Scaled Pearson X2 1 0.1621 0.1621 Log Likelihood 5164.1959

Algorithm converged. Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 4.8760 0.0679 4.7429 5.0090 5160.87 <.0001 belief 1 1.0587 0.0692 0.9230 1.1944 233.83 <.0001 female 1 0.1340 0.0607 0.0151 0.2530 4.88 0.0272 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

The GENMOD Procedure

Model Information Data Set WORK.AFTER Distribution Poisson Link Function Log Dependent Variable count Observations Used 4 Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF Deviance 1 0.1620 0.1620 Scaled Deviance 1 0.1620 0.1620 Pearson Chi-Square 1 0.1621 0.1621 Scaled Pearson X2 1 0.1621 0.1621 Log Likelihood 5164.1959

Algorithm converged. Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 6.0687 0.0451 5.9802 6.1571 18087.0 <.0001 f1 1 -0.1340 0.0607 -0.2530 -0.0151 4.88 0.0272 b1 1 -1.0587 0.0692 -1.1944 -0.9230 233.83 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

The GENMOD Procedure

Model Information Data Set WORK.AFTER Distribution Poisson Link Function Log Dependent Variable count Observations Used 4 Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF Deviance 1 0.1620 0.1620 Scaled Deviance 1 0.1620 0.1620 Pearson Chi-Square 1 0.1621 0.1621 Scaled Pearson X2 1 0.1621 0.1621 Log Likelihood 5164.1959

Algorithm converged. Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 5.4723 0.0347 5.4043 5.5403 24904.4 <.0001 b2 1 0.5293 0.0346 0.4615 0.5972 233.83 <.0001 f2 1 0.0670 0.0303 0.0075 0.1265 4.88 0.0272 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

The observed frequencies and fitted values, table 6.1, p. 147.

proc format;
  value female 1='Female' 0='Male';
  value belief 1='Yes' 0='No';
run;
proc print data=temp;
  format female female. belief belief.;
  var female belief count predict;
run;

Obs female belief count predict

1 Female Yes 435 432.099 2 Female No 147 149.901 3 Male Yes 375 377.901 4 Male No 134 131.099

You can obtain both the fitted values and the G^2 and X^2 statistics from proc freq. In the model statement add the option expected to obtain the fitted values and chisq to obtain the G^2 and X^2 statistics since these values are simply the test statistics for testing the hypothesis of independence in two-way tables.

proc freq data = after ;
  format female female. belief belief.;
  tables female*belief/ norow nocol nopercent expected chisq;
  weight count;
  output out=temp;
run;

The FREQ Procedure
Table of female by belief
female     belief
Frequency|
Expected |No      |Yes     |  Total
———+——–+——–+
Male     |    134 |    375 |    509
         |  131.1 |  377.9 |
———+——–+——–+
Female   |    147 |    435 |    582
         |  149.9 |  432.1 |
———+——–+——–+
Total         281      810     1091
Statistics for Table of female by belief
Statistic                     DF       Value      Prob
——————————————————
Chi-Square                     1      0.1621    0.6872
Likelihood Ratio Chi-Square    1      0.1620    0.6873
Continuity Adj. Chi-Square     1      0.1110    0.7390
Mantel-Haenszel Chi-Square     1      0.1619    0.6874
Phi Coefficient                       0.0122
Contingency Coefficient               0.0122
Cramer’s V                            0.0122
       Fisher’s Exact Test
———————————-
Cell (1,1) Frequency (F)       134
Left-sided Pr <= F          0.6817
Right-sided Pr >= F         0.3693
Table Probability (P)       0.0510
Two-sided Pr <= P           0.7287
Sample Size = 1091

Table 6.2, p. 149 was omitted.

Inputting the drug data set, table 6.3, p. 152.

data drug;
  input alcohol smoke marijuana count;
cards;
1   1   1   911
1   1   0   538
1   0   1    44
1   0   0   456
0   1   1     3
0   1   0    43
0   0   1     2
0   0   0   279
;
run;

Table 6.4, p. 152.

ods listing close;
proc catmod data=drug;
   weight count;
   model alcohol*smoke*marijuana=_response_ / pred=freq noprofile noparm noiter noresponse;
   loglin alcohol smoke marijuana;
   ods output PredictedFreqs=temp1;
run;
quit;
data temp1 (keep=p1 id) ;
  set temp1;
  rename predfunction=p1;
  id = _n_;
run;
proc catmod data=drug;
   weight count;
   model alcohol*smoke*marijuana=_response_ / pred=freq noprofile noparm noiter noresponse;
   loglin alcohol|smoke marijuana;
   ods output PredictedFreqs=temp2;
run;
quit;
data temp2 (keep=p2 id) ;
  set temp2;
  rename predfunction=p2;
  id = _n_;
run;
proc catmod data=drug;
   weight count;
   model alcohol*smoke*marijuana=_response_ / pred=freq noprofile noparm noiter noresponse;
   loglin alcohol|marijuana smoke|marijuana;
   ods output PredictedFreqs=temp3;
run;
quit;
data temp3 (keep=p3 id) ;
  set temp3;
  rename predfunction=p3;
  id = _n_;
run;
proc catmod data=drug;
   weight count;
   model alcohol*smoke*marijuana=_response_ / pred=freq noprofile noparm noiter noresponse;
   loglin alcohol|smoke alcohol|marijuana smoke|marijuana;
   ods output PredictedFreqs=temp4;
run;
quit;
data temp4 (keep=p4 id) ;
  set temp4;
  rename predfunction=p4;
  id = _n_;
run;
proc catmod data=drug;
   weight count;
   model alcohol*smoke*marijuana=_response_ / pred=freq noprofile noparm noiter noresponse;
   loglin alcohol|smoke|marijuana;
   ods output PredictedFreqs=temp5;
run;
quit;
data temp5 ;
  set temp5;
  rename predfunction=p5;
  id = _n_;
run;
ods output close;
ods listing;
data combo;
  merge temp1 temp2 temp3 temp4 temp5;
  by id;
run;
 
data combo;
  set combo;
  rename p1 = A_C_M;
  rename p2 = AC_M;
  rename p3 = AM_CM;
  rename p4 = AC_AM_CM;
  rename p5 = ACM;
  alcohol1=alcohol+0;
  marijuana1= marijuana+0;
  smoke1=smoke+0;
  run;
proc format;
   value yesno 1='Yes' 0='No';
run;
proc print data=combo;
  format alcohol1 yesno. smoke1 yesno. marijuana1 yesno. ;
  var alcohol1 smoke1 marijuana1  A_C_M AC_M AM_CM AC_AM_CM  ACM;
run;

Obs alcohol1 smoke1 marijuana1 A_C_M AC_M AM_CM AC_AM_CM ACM

1 No No No 64.8799 162.4763 179.8404 279.6168 279 2 No No Yes 47.32881 118.5237 0.239583 1.38317 2 3 No Yes No 124.1939 26.59754 142.1596 42.38317 43 4 No Yes Yes 90.59739 19.40246 4.760418 3.61683 3 5 Yes No No 386.7001 289.1037 555.1596 455.3832 456 6 Yes No Yes 282.0912 210.8963 45.76042 44.61683 44 7 Yes Yes No 740.2261 837.8225 438.8404 538.6168 538 8 Yes Yes Yes 539.9826 611.1775 909.2396 910.3832 911

Table 6.5, p. 153 omitted.

Table 6.6, p. 155.

ods listing close;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol smoke marijuana ;
  ods output anova=temp1;
run;
data temp1;
  set temp1;
  model = 'A_C_M';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol smoke|marijuana ;
  ods output anova=temp2;
run;
data temp2;
  set temp2;
  model = 'A_CM';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|marijuana smoke  ;
  ods output anova=temp3;
run;
data temp3;
  set temp3;
  model = 'AM_C';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  marijuana smoke|alcohol  ;
  ods output anova=temp4;
run;
data temp4;
  set temp4;
  model = 'AC_M';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|marijuana smoke|alcohol  ;
  ods output anova=temp5;
run;
data temp5;
  set temp5;
  model = 'AC_AM';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|smoke marijuana|smoke  ;
  ods output anova=temp6;
run;
data temp6;
  set temp6;
  model = 'AC_CM';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|marijuana marijuana|smoke  ;
  ods output anova=temp7;
run;
data temp7;
  set temp7;
  model = 'AM_CM';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|smoke|marijuana @2;
  ods output anova=temp8;
run;
data temp8;
  set temp8;
  model = 'AC_CM_CM';
run;
proc catmod data=drug;
  weight count;
  model alcohol*smoke*marijuana= _response_ ;
  loglin  alcohol|smoke|marijuana;
  ods output anova=temp9;
run;
data temp9;
  set temp9;
  model = 'ACM';
run;
quit;
ods output close;
ods listing;
data combo;
  set temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9;
  rename chisq=G2;
  rename probchisq=P_value;
run;
proc print data=combo noobs;
  where source='Likelihood Ratio';
  var model G2 df P_value;
run;

Model G2 DF P_value

A_C_M 1286.02 4 <.0001 A_CM 534.21 3 <.0001 AM_C 939.56 3 <.0001 AC_M 843.83 3 <.0001 AC_AM 497.37 2 <.0001 AC_CM 92.02 2 <.0001 AM_CM 187.75 2 <.0001 AC_CM_CM 0.37 1 0.5408 ACM . 0 .

Table 6.7, p. 156

proc genmod data=drug ;
  class alcohol smoke marijuana;
  model count= alcohol smoke marijuana alcohol*marijuana smoke*marijuana / dist=p;
  output out=temp reslik=reslik p=pred;
run;
data temp;
  set temp;
  id = _n_;
  rename reslik=adjres1;
  rename pred=fitted1;
run;
proc genmod data=drug;
  class alcohol smoke marijuana;
  model count= alcohol smoke marijuana alcohol*marijuana smoke*marijuana alcohol*smoke/ dist=p;
  output out=temp1 reslik=reslik p=pred;
run;
data temp1;
  set temp1;
  id = _n_;
  rename reslik=adjres2;
  rename pred=fitted2;
run;
data combo;
  merge temp temp1;
  by id;
run;
proc print data=combo;
  format alcohol yesno. smoke yesno. marijuana yesno.;
  var alcohol smoke marijuana count fitted1 adjres1 fitted2 adjres2;
run;

Obs    alcohol    smoke    marijuana    count    fitted1     adjres1    fitted2     adjres2
 1       Yes       Yes        Yes        911     909.240      3.6955    910.383     0.63332
 2       Yes       Yes        No         538     438.840     12.7451    538.617    -0.63333
 3       Yes       No         Yes         44      45.760     -3.6956     44.617    -0.63336
 4       Yes       No         No         456     555.160    -12.8498    455.383     0.63332
 5       No        Yes        Yes          3       4.761     -3.7093      3.617    -0.63848
 6       No        Yes        No          43     142.160    -13.7941     42.383     0.63329
 7       No        No         Yes          2       0.240      2.3852      1.383     0.60617
 8       No        No         No         279     179.840     12.4902    279.617    -0.63333

Inputting the Injury Data, table 6.8, p. 159.

data injury;
  input G L S I count;
cards;
0 0 0 0  7287
0 0 0 1   996
0 0 1 0 11587
0 0 1 1   759
0 1 0 0  3246
0 1 0 1   973
0 1 1 0  6134
0 1 1 1   757
1 0 0 0 10381
1 0 0 1   812
1 0 1 0 10969
1 0 1 1   380
1 1 0 0  6123
1 1 0 1  1084
1 1 1 0  6693
1 1 1 1   513
;
run;

Table 6.8, p. 159.

ods listing close;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_/ pred=freq;
  loglin g|i g|l g|s i|l i|s l|s;
  ods output predictedfreqs=temp1 anova=temp;
run;
quit;
data temp1 (keep=p1 functionnum);
  set temp1;
  rename predfunction=p1;
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_/ pred=freq;
  loglin g|l|s g|i i|l i|s;
  ods output predictedfreqs=temp2 anova=temp3;
run;
quit;
ods output close;
ods listing;
data temp2;
  set temp2;
  rename predfunction=p2;
run;
data combo;
  merge temp1 temp2;
  by functionnum;
  Male=G+0;
  Location=L+0;
  Seat=S+0;
  Injury=I+0;
  rename obsfunction = observed;
run;
proc format;
  value male 0='Female' 1='Male';
  value location 0='Urban' 1='Rural';
  value Yesno 0='No' 1='Yes';
run;
proc sort data=combo;
  by male location seat injury;
run;
proc print data= combo;
  format male male. location location. seat yesno. injury yesno.;
  var male location seat injury observed p1 p2;
run;

Obs Male Location Seat Injury observed p1 p2

1 Female Urban No No 7287 7166.369 7273.214 2 Female Urban No Yes 996 993.0169 1009.786 3 Female Urban Yes No 11587 11748.31 11632.62 4 Female Urban Yes Yes 759 721.3055 713.3779 5 Female Rural No No 3246 3353.829 3254.662 6 Female Rural No Yes 973 988.7848 964.3382 7 Female Rural Yes No 6134 5985.493 6093.502 8 Female Rural Yes Yes 757 781.8927 797.4979 9 Male Urban No No 10381 10471.5 10358.93 10 Male Urban No Yes 812 845.1187 834.0683 11 Male Urban Yes No 10969 10837.83 10959.23 12 Male Urban Yes Yes 380 387.5588 389.7677 13 Male Rural No No 6123 6045.306 6150.192 14 Male Rural No Yes 1084 1038.08 1056.808 15 Male Rural Yes No 6693 6811.371 6697.644 16 Male Rural Yes Yes 513 518.2429 508.3564

The dissimilarity index, D, for both models, p. 162.

Note: The index for model (GI, GL, GS, IL, IS, LS) is d1 and the index for model (GIL, GIS, GLS, ILS) is d2.

proc sql;
  select  sum( abs(observed-p1) ) / (2*sum(observed)) as d1,
           sum( abs(observed-p2) ) /( 2*sum(observed) ) as d2
  from combo;
quit;

      d1        d2
——————
0.008219  0.002507

Table 6.9, p. 160.

ods listing close;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g i l s ;
  ods output anova=temp1;
run;
data temp1;
  set temp1;
  model = 'G_I_L_S';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g|i g|l g|s i|l i|s l|s ;
  ods output anova=temp2;
run;
data temp2;
  set temp2;
  model = 'GI_GL_GS_IL_IS_LS';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g|i|l g|i|s g|l|s i|l|s ;
  ods output anova=temp3;
run;
data temp3;
  set temp3;
  model = 'GIL_GIS_GLS_ILS';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g|i|l g|s i|s l|s ;
  ods output anova=temp4;
run;
data temp4;
  set temp4;
  model = 'GIL_GS_LS_IS';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g|i|s  g|l i|l l|s ;
  ods output anova=temp5;
run;
data temp5;
  set temp5;
  model = 'GIS_GL_LI_LS';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin g|l|s  g|i i|l i|s ;
  ods output anova=temp6;
run;
data temp6;
  set temp6;
  model = 'GLS_GI_IL_IS';
run;
proc catmod data=injury;
  weight count;
  model G*I*L*S= _response_ ;
  loglin i|l|s  g|i g|l g|s ;
  ods output anova=temp7;
run;
data temp7;
  set temp7;
  model = 'ILS_GI_GL_GS';
run;
ods output close;
ods listing;
data combo;
  set temp1 temp2 temp3 temp4 temp5 temp6 temp7;
  rename chisq=G2;
  rename probchisq=P_value;
run;
proc print data=combo;
  where source='Likelihood Ratio';
  var model G2 P_value;
run;

Obs Model G2 P_value

5 G_I_L_S 2792.77 <.0001 16 GI_GL_GS_IL_IS_LS 23.35 0.0003 31 GIL_GIS_GLS_ILS 1.33 0.2496 43 GIL_GS_LS_IS 18.57 0.0010 55 GIS_GL_LI_LS 22.85 0.0001 67 GLS_GI_IL_IS 7.46 0.1133 79 ILS_GI_GL_GS 20.63 0.0004