This unit makes extensive use of the ipf (iterated proportional fitting) command written by Adrian Mander. Use search ipf in Stata to locate the command (see How can I use the search command to search for programs and get additional help? for more information about using search). We will use the glm command with the pois family to obtain coefficients.
Table 6.1, page 147.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear
list
gender aftlife freq
1. females yes 435
2. females no 147
3. males yes 375
4. males no 134
table gender aftlife [fw=freq], cont(freq)
----------------------
| belief in
| afterlife
gender | no yes
----------+-----------
male | 134 375
females | 147 435
----------------------
ipf [fw=freq], fit(gender+aftlife) save(aftlif) exp nolog
Deleting all matrices......
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : gender
marginal model 2 varlist : aftlife
unique varlist gender aftlife
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1
Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 = 0.1620 p-value = 0.687
Pearson Statistic X^2 = 0.1621 p-value = 0.687
gender aftlife Efreq Ofreq prob
0 0 131.09899 134 .12016406
0 1 377.90101 375 .34638039
1 0 149.90101 147 .13739781
1 1 432.09899 435 .39605774
use aftlif, clear
table gender aftlife, cont(mean Efreq)
--------------------------------
| aftlife
gender | 0 1
----------+---------------------
0 | 131.09899 377.90101
1 | 149.90101 432.09899
--------------------------------
generate lefreq = ln(Efreq)
table gender aftlife, cont(mean lefreq)
------------------------------
| aftlife
gender | 0 1
----------+-------------------
0 | 4.875953 5.934632
1 | 5.009975 6.068655
------------------------------
use https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear
glm freq gender aftlife, fam(pois) link(log)
Generalized linear models No. of obs = 4
Optimization : ML: Newton-Raphson Residual df = 1
Scale param = 1
Deviance = .1619951194 (1/df) Deviance = .1619951
Pearson = .162083973 (1/df) Pearson = .162084
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -14.70362649 AIC = 8.851813
BIC = -3.996887964
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender | .1340224 .0606865 2.21 0.027 .0150791 .2529658
aftlife | 1.05868 .0692336 15.29 0.000 .9229843 1.194375
_cons | 4.875953 .0678732 71.84 0.000 4.742924 5.008982
------------------------------------------------------------------------------
generate g2 = ~gender
generate a2 = ~aftlife
glm freq g2 a2, fam(pois) link(log)
Generalized linear models No. of obs = 4
Optimization : ML: Newton-Raphson Residual df = 1
Scale param = 1
Deviance = .1619951194 (1/df) Deviance = .1619951
Pearson = .162083973 (1/df) Pearson = .162084
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -14.70362649 AIC = 8.851813
BIC = -3.996887964
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2 | -.1340224 .0606865 -2.21 0.027 -.2529658 -.0150791
a2 | -1.05868 .0692336 -15.29 0.000 -1.194375 -.9229843
_cons | 6.068655 .0451242 134.49 0.000 5.980213 6.157096
------------------------------------------------------------------------------
generate g3 = gender - g2
generate a3 = aftlife - a2
list
gender aftlife freq g2 a2 g3 a3
1. females yes 435 0 0 1 1
2. females no 147 0 1 1 -1
3. male yes 375 1 0 -1 1
4. male no 134 1 1 -1 -1
glm freq g3 a3, fam(pois) link(log)
Generalized linear models No. of obs = 4
Optimization : ML: Newton-Raphson Residual df = 1
Scale param = 1
Deviance = .1619951194 (1/df) Deviance = .1619951
Pearson = .162083973 (1/df) Pearson = .162084
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -14.70362649 AIC = 8.851813
BIC = -3.996887964
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g3 | .0670112 .0303432 2.21 0.027 .0075396 .1264829
a3 | .5293398 .0346168 15.29 0.000 .4614921 .5971874
_cons | 5.472304 .0346763 157.81 0.000 5.404339 5.540268
------------------------------------------------------------------------------
Table 6.3, page 152.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/acm, clear
describe
Contains data from acm.dta
obs: 8
vars: 4 28 Nov 2001 14:28
size: 72 (99.7% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
a byte %8.0g yn alcohol use
c byte %8.0g yn cigarette use
m byte %8.0g yn marijuana use
freq int %8.0g
-------------------------------------------------------------------------------
list
a c m freq
1. yes no yes 44
2. no no yes 2
3. no yes yes 3
4. yes yes yes 911
5. no no no 279
6. no yes no 43
7. yes no no 456
8. yes yes no 538
table c m [fw=freq], by(a)
----------------------
alcohol |
use and | marijuana
cigarette | use
use | yes no
----------+-----------
yes |
yes | 911 538
no | 44 456
----------+-----------
no |
yes | 3 43
no | 2 279
----------------------
Table 6.4, page 152, output edited.
/* (A, C, M) */
ipf [fw=freq], fit(a+c+m) exp
a c m Efreq
1 1 1 539.98258
1 1 2 740.22612
1 2 1 282.09123
1 2 2 386.70007
2 1 1 90.597385
2 1 2 124.19392
2 2 1 47.328801
2 2 2 64.879898
/* (AC, M) */
ipf [fw=freq], fit(a*c+m) exp
a c m Efreq
1 1 1 611.1775
1 1 2 837.8225
1 2 1 210.89631
1 2 2 289.10369
2 1 1 19.40246
2 1 2 26.59754
2 2 1 118.52373
2 2 2 162.47627
/* (AM, CM) */
ipf [fw=freq], fit(a*m+c*m) exp
a m c Efreq
1 1 1 909.23958
1 1 2 45.760417
1 2 1 438.84043
1 2 2 555.15957
2 1 1 4.7604167
2 1 2 .23958333
2 2 1 142.15957
2 2 2 179.84043
/* (AC, AM, CM) */
ipf [fw=freq], fit(a*c+a*m+c*m) exp
a c m Efreq
1 1 1 910.38316
1 1 2 538.61683
1 2 1 44.616829
1 2 2 455.38327
2 1 1 3.6168352
2 1 2 42.383171
2 2 1 1.3831706
2 2 2 279.61673
/* (ACM) */
ipf [fw=freq], fit(a*c*m) exp
a c m Efreq
1 1 1 911
1 1 2 538
1 2 1 44
1 2 2 456
2 1 1 3
2 1 2 43
2 2 1 2
2 2 2 279
Table 6.6, page 155, output edited.
ipf [fw=freq], fit(a+c+m) df = 4 Likelihood Ratio Statistic G^2 = 1286.0199 p-value = 0.000 Pearson Statistic X^2 = 1411.3860 p-value = 0.000 ipf [fw=freq], fit(a+c*m) df = 3 Likelihood Ratio Statistic G^2 = 534.2117 p-value = 0.000 Pearson Statistic X^2 = 505.5977 p-value = 0.000 ipf [fw=freq], fit(c+a*m) df = 3 Likelihood Ratio Statistic G^2 = 939.5626 p-value = 0.000 Pearson Statistic X^2 = 824.1630 p-value = 0.000 ipf [fw=freq], fit(m+a*c) df = 3 Likelihood Ratio Statistic G^2 = 843.8267 p-value = 0.000 Pearson Statistic X^2 = 704.9071 p-value = 0.000 ipf [fw=freq], fit(a*c+a*m) df = 2 Likelihood Ratio Statistic G^2 = 497.3693 p-value = 0.000 Pearson Statistic X^2 = 443.7611 p-value = 0.000 ipf [fw=freq], fit(a*c+c*m) df = 2 Likelihood Ratio Statistic G^2 = 92.0184 p-value = 0.000 Pearson Statistic X^2 = 80.8148 p-value = 0.000 ipf [fw=freq], fit(a*m+c*m) df = 2 Likelihood Ratio Statistic G^2 = 187.7543 p-value = 0.000 Pearson Statistic X^2 = 177.6149 p-value = 0.000 ipf [fw=freq], fit(a*c+a*m+c*m) Likelihood Ratio Statistic G^2 = 0.3740 p-value = 0.541 Pearson Statistic X^2 = 0.4011 p-value = 0.527 ipf [fw=freq], fit(a*c*m) df = 0 Likelihood Ratio Statistic G^2 = 0.0000 p-value = . Pearson Statistic X^2 = 0.0000 p-value = .
Table 6.7, page 156.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/acm, clear
gen ac=a*c
gen am=a*m
gen cm=c*m
glm freq a c m am cm, fam(poi)
Iteration 0: log likelihood = -306.78871
Iteration 1: log likelihood = -134.68656
Iteration 2: log likelihood = -119.80666
Iteration 3: log likelihood = -118.41883
Iteration 4: log likelihood = -118.39888
Iteration 5: log likelihood = -118.39887
Iteration 6: log likelihood = -118.39887
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 2
Scale parameter = 1
Deviance = 187.7543029 (1/df) Deviance = 93.87715
Pearson = 177.6148606 (1/df) Pearson = 88.80743
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -118.3988656 AIC = 31.09972
BIC = 183.5954198
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | -9.377361 .8990551 -10.43 0.000 -11.13948 -7.615246
c | -6.213498 .3072696 -20.22 0.000 -6.815735 -5.611261
m | -8.077869 .4938394 -16.36 0.000 -9.045777 -7.109962
am | 4.125088 .4529445 9.11 0.000 3.237333 5.012843
cm | 3.224309 .1609812 20.03 0.000 2.908792 3.539826
_cons | 23.13194 .9652276 23.97 0.000 21.24013 25.02375
------------------------------------------------------------------------------
predict fit1
(option mu assumed; predicted mean freq)
predict h1, h
predict res1, p
gen ares1 = res1/sqrt(1-h1)
glm freq a c m ac am cm, fam(poi)
Iteration 0: log likelihood = -142.34193
Iteration 1: log likelihood = -37.961044
Iteration 2: log likelihood = -25.867183
Iteration 3: log likelihood = -24.719804
Iteration 4: log likelihood = -24.708713
Iteration 5: log likelihood = -24.708707
Iteration 6: log likelihood = -24.708707
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 1
Scale parameter = 1
Deviance = .3739858701 (1/df) Deviance = .3739859
Pearson = .4011005168 (1/df) Pearson = .4011005
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -24.70870712 AIC = 7.927177
BIC = -1.705455672
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | -10.56882 .9109278 -11.60 0.000 -12.3542 -8.78343
c | -7.918178 .3476245 -22.78 0.000 -8.599509 -7.236846
m | -6.358765 .4957275 -12.83 0.000 -7.330373 -5.387157
ac | 2.054534 .1740643 11.80 0.000 1.713374 2.395694
am | 2.986014 .464678 6.43 0.000 2.075262 3.896767
cm | 2.847889 .1638394 17.38 0.000 2.52677 3.169009
_cons | 23.77119 .9484083 25.06 0.000 21.91234 25.63003
------------------------------------------------------------------------------
predict fit2
(option mu assumed; predicted mean freq)
predict h2, h
predict res2, p
gen ares2 = res2/sqrt(1-h2)
list a c m freq fit1 fit2 ares1 ares2
+----------------------------------------------------------------------+
| a c m freq fit1 fit2 ares1 ares2 |
|----------------------------------------------------------------------|
1. | no no yes 2 .2395833 1.38317 3.695589 .6333249 |
2. | no yes yes 3 4.760417 3.61683 -3.695589 -.6333249 |
3. | no yes no 43 142.1596 42.38317 -12.80459 .6333254 |
4. | yes no yes 44 45.76042 44.61683 -3.695596 -.6333249 |
5. | no no no 279 179.8404 279.6168 12.80459 -.6333253 |
|----------------------------------------------------------------------|
6. | yes no no 456 555.1595 455.3832 -12.80459 .6333241 |
7. | yes yes no 538 438.8404 538.6168 12.80459 -.6333285 |
8. | yes yes yes 911 909.2396 910.3832 3.695599 .6333305 |
+----------------------------------------------------------------------+
Table 6.8, page 159.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear
describe
Contains data from injury.dta
obs: 16
vars: 5 29 Nov 2001 08:11
size: 160 (100.0% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
g byte %8.0g gen gender
l byte %8.0g loc location
s byte %8.0g yn seat-belt
j byte %8.0g yn injury
freq int %8.0g
-------------------------------------------------------------------------------
list
g l s j freq
1. female urban no no 7287
2. female urban no yes 996
3. female urban yes no 11587
4. female urban yes yes 759
5. female rural no no 3246
6. female rural no yes 973
7. female rural yes no 6134
8. female rural yes yes 757
9. male urban no no 10381
10. male urban no yes 812
11. male urban yes no 10969
12. male urban yes yes 380
13. male rural no no 6123
14. male rural no yes 1084
15. male rural yes no 6693
16. male rural yes yes 513
table s j [fw=freq], by(g l)
--------------------------
gender, |
location |
and | injury
seat-belt | no yes
----------+---------------
female |
urban |
no | 7,287 996
yes | 11,587 759
----------+---------------
female |
rural |
no | 3,246 973
yes | 6,134 757
----------+---------------
male |
urban |
no | 10,381 812
yes | 10,969 380
----------+---------------
male |
rural |
no | 6,123 1,084
yes | 6,693 513
--------------------------
ipf [fw=freq], fit(g*j+g*l+g*s+j*l+j*s+l*s) exp save(inj2)
Deleting all matrices......
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : g j
marginal model 2 varlist : g l
marginal model 3 varlist : g s
marginal model 4 varlist : j l
marginal model 5 varlist : j s
marginal model 6 varlist : l s
unique varlist g j l s
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 13
Goodness of Fit Tests
---------------------
df = 13
Likelihood Ratio Statistic G^2 = 23.3510 p-value = 0.038
Pearson Statistic X^2 = 23.3752 p-value = 0.037
g j l s Efreq Ofreq prob
1 1 1 1 7166.3695 7287 .10432308
1 1 1 2 11748.308 11587 .17102379
1 1 2 1 3353.8303 3246 .04882275
1 1 2 2 5985.4936 6134 .0871327
1 2 1 1 993.01641 996 .01445565
1 2 1 2 721.30528 759 .01050027
1 2 2 1 988.78428 973 .01439404
1 2 2 2 781.89238 757 .01138225
2 1 1 1 10471.495 10381 .15243682
2 1 1 2 10837.827 10969 .15776963
2 1 2 1 6045.3055 6123 .0880034
2 1 2 2 6811.3709 6693 .09915525
2 2 1 1 845.11924 812 .01230266
2 2 1 2 387.55922 380 .00564182
2 2 2 1 1038.0799 1084 .01511165
2 2 2 2 518.2432 513 .00754423
use inj2, clear
table s j, by(g l) cont(mean Efreq)
--------------------------------
g, l and | j
s | 1 2
----------+---------------------
1 |
1 |
1 | 7166.3695 993.01641
2 | 11748.308 721.30528
----------+---------------------
1 |
2 |
1 | 3353.8303 988.78428
2 | 5985.4936 781.89238
----------+---------------------
2 |
1 |
1 | 10471.495 845.11924
2 | 10837.827 387.55922
----------+---------------------
2 |
2 |
1 | 6045.3055 1038.0799
2 | 6811.3709 518.2432
--------------------------------
use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear
ipf [fw=freq], fit(g*l*s+g*j+j*l+j*s) exp save(inj3)
Deleting all matrices......
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : g l s
marginal model 2 varlist : g j
marginal model 3 varlist : j l
marginal model 4 varlist : j s
unique varlist g l s j
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 12
Goodness of Fit Tests
---------------------
df = 12
Likelihood Ratio Statistic G^2 = 7.4645 p-value = 0.825
Pearson Statistic X^2 = 7.4874 p-value = 0.824
g l s j Efreq Ofreq prob
1 1 1 1 7273.2141 7287 .10587845
1 1 1 2 1009.7858 996 .01469977
1 1 2 1 11632.621 11587 .16933969
1 1 2 2 713.37784 759 .01038486
1 2 1 1 3254.6633 3246 .04737915
1 2 1 2 964.3383 973 .01403817
1 2 2 1 6093.502 6134 .08870501
1 2 2 2 797.49773 757 .01160942
2 1 1 1 10358.931 10381 .15079819
2 1 1 2 834.06847 812 .0121418
2 1 2 1 10959.234 10969 .15953699
2 1 2 2 389.76793 380 .00567397
2 2 1 1 6150.1915 6123 .08953026
2 2 1 2 1056.8074 1084 .01538428
2 2 2 1 6697.6432 6693 .09749968
2 2 2 2 508.3565 513 .0074003
use inj3
table s j, by(g l) cont(mean Efreq)
--------------------------------
g, l and | j
s | 1 2
----------+---------------------
1 |
1 |
1 | 7273.2141 1009.7858
2 | 11632.621 713.37784
----------+---------------------
1 |
2 |
1 | 3254.6633 964.3383
2 | 6093.502 797.49773
----------+---------------------
2 |
1 |
1 | 10358.931 834.06847
2 | 10959.234 389.76793
----------+---------------------
2 |
2 |
1 | 6150.1915 1056.8074
2 | 6697.6432 508.3565
--------------------------------
Table 6.9, page 160, output edited.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/injury, clear ipf [fw=freq], fit(g+j+l+s) df = 11 Likelihood Ratio Statistic G^2 = 2792.7710 p-value = 0.000 Pearson Statistic X^2 = 2758.3408 p-value = 0.000 ipf [fw=freq], fit(g*j+g*l+g*s+j*l+j*s+l*s) df = 13 Likelihood Ratio Statistic G^2 = 23.3510 p-value = 0.038 Pearson Statistic X^2 = 23.3752 p-value = 0.037 ipf [fw=freq], fit(g*j*l+g*j*s+g*l*s+j*l*s) df = 7 Likelihood Ratio Statistic G^2 = 1.3253 p-value = 0.988 Pearson Statistic X^2 = 1.3246 p-value = 0.988 ipf [fw=freq], fit(g*j*l+g*s+j*s+l*s) df = 10 Likelihood Ratio Statistic G^2 = 18.5693 p-value = 0.046 Pearson Statistic X^2 = 18.5391 p-value = 0.047 ipf [fw=freq], fit(g*j*s+g*l+j*l+l*s) df = 10 Likelihood Ratio Statistic G^2 = 22.8468 p-value = 0.011 Pearson Statistic X^2 = 22.8250 p-value = 0.011 ipf [fw=freq], fit(g*l*s+g*j+j*l+j*s) df = 12 Likelihood Ratio Statistic G^2 = 7.4645 p-value = 0.825 Pearson Statistic X^2 = 7.4874 p-value = 0.824 ipf [fw=freq], fit(j*l*s+g*j+g*l+g*s) df = 10 Likelihood Ratio Statistic G^2 = 20.6334 p-value = 0.024 Pearson Statistic X^2 = 20.6131 p-value = 0.024
