Table 4.1, page 75.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/snoring, clear
tab snoring heart [fw=count], row
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
| heart
snoring | 0 1 | Total
-----------+----------------------+----------
0 | 1,355 24 | 1,379
| 98.26 1.74 | 100.00
-----------+----------------------+----------
2 | 603 35 | 638
| 94.51 5.49 | 100.00
-----------+----------------------+----------
4 | 192 21 | 213
| 90.14 9.86 | 100.00
-----------+----------------------+----------
5 | 224 30 | 254
| 88.19 11.81 | 100.00
-----------+----------------------+----------
Total | 2,374 110 | 2,484
| 95.57 4.43 | 100.00
reg heart snoring [fw=count]
Source | SS df MS Number of obs = 2484
-------------+------------------------------ F( 1, 2482) = 74.82
Model | 3.07633377 1 3.07633377 Prob > F = 0.0000
Residual | 102.052491 2482 .041117039 R-squared = 0.0293
-------------+------------------------------ Adj R-squared = 0.0289
Total | 105.128824 2483 .042339438 Root MSE = .20277
------------------------------------------------------------------------------
heart | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
snoring | .020038 .0023166 8.65 0.000 .0154954 .0245806
_cons | .0168723 .0051571 3.27 0.001 .0067598 .0269849
------------------------------------------------------------------------------
predict ylin
(option xb assumed; fitted values)
logit heart snoring [fw=count]
Logit estimates Number of obs = 2484
LR chi2(1) = 63.10
Prob > chi2 = 0.0000
Log likelihood = -418.86582 Pseudo R2 = 0.0700
------------------------------------------------------------------------------
heart | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
snoring | .3973366 .0500106 7.95 0.000 .2993176 .4953557
_cons | -3.866248 .1662144 -23.26 0.000 -4.192022 -3.540474
------------------------------------------------------------------------------
predict ylogit
(option p assumed; Pr(heart))
probit heart snoring [fw=count]
Probit estimates Number of obs = 2484
LR chi2(1) = 64.03
Prob > chi2 = 0.0000
Log likelihood = -418.39714 Pseudo R2 = 0.0711
------------------------------------------------------------------------------
heart | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
snoring | .1877705 .02363 7.95 0.000 .1414565 .2340844
_cons | -2.060552 .0704491 -29.25 0.000 -2.198629 -1.922474
------------------------------------------------------------------------------
predict yprob
(option p assumed; Pr(heart))
list if heart==1
+----------------------------------------------------------+
| snoring heart count ylin ylogit yprob |
|----------------------------------------------------------|
1. | 0 1 24 .0168723 .0205074 .0196729 |
2. | 2 1 35 .0569483 .0442951 .0459933 |
3. | 4 1 21 .0970243 .0930541 .0951876 |
4. | 5 1 30 .1170623 .1324389 .1309952 |
+----------------------------------------------------------+
Figure 4.1 on page 76.
label variable ylin "Linear" label variable ylogit "Logistic" label variable yprob "Probit" graph twoway connect ylin ylogit yprob snoring, ytitle(Predicted Probability)
Section 4.3.2, page 84. An example using horseshoe crabs data.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear
glm satell width , family(poisson) nolog
Generalized linear models No. of obs = 173
Optimization : ML: Newton-Raphson Residual df = 171
Scale parameter = 1
Deviance = 567.878575 (1/df) Deviance = 3.320927
Pearson = 544.1570201 (1/df) Pearson = 3.182205
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -461.5881235 AIC = 5.3594
BIC = -313.3342876
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1640451 .0199653 8.22 0.000 .1249137 .2031764
_cons | -3.304757 .5422416 -6.09 0.000 -4.367531 -2.241983
------------------------------------------------------------------------------
Figure 4.3 on page 84.
sort satell width by satell width: gen count=_N graph twoway scatter satell width, mlabel(count) mlabcolor(black) ytitle(Number of Satellites) msize(tiny)
glm satell width if width <=33 , family(poisson) nolog
Generalized linear models No. of obs = 172
Optimization : ML: Newton-Raphson Residual df = 170
Scale parameter = 1
Deviance = 567.3393552 (1/df) Deviance = 3.33729
Pearson = 544.0694946 (1/df) Pearson = 3.200409
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -459.4147233 AIC = 5.365287
BIC = -307.7347058
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1699846 .0216086 7.87 0.000 .1276325 .2123368
_cons | -3.461006 .584658 -5.92 0.000 -4.606915 -2.315098
------------------------------------------------------------------------------
glm satell width , family(poisson) link(identity) nolog
Generalized linear models No. of obs = 173
Optimization : ML: Newton-Raphson Residual df = 171
Scale parameter = 1
Deviance = 557.7083301 (1/df) Deviance = 3.261452
Pearson = 542.4855164 (1/df) Pearson = 3.17243
Variance function: V(u) = u [Poisson]
Link function : g(u) = u [Identity]
Standard errors : OIM
Log likelihood = -456.503001 AIC = 5.300613
BIC = -323.5045326
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .5494969 .0592926 9.27 0.000 .4332856 .6657082
_cons | -11.53206 1.510399 -7.64 0.000 -14.49239 -8.57173
------------------------------------------------------------------------------
Figure 4.4 on page 85.
gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a by a: egen smean = mean(satell) by a: egen wmean=mean(width) lowess smean wmean
glm satell width, family(poisson) nolog predict ylog label variable ylog "Log link" glm satell width , family(poisson) link(identity) nolog predict ylin label variable ylin "Identity link" graph twoway scatter smean wmean || line ylin ylog width if ylog<=6, legend(off)
Section 4.3.4, page 87. Horseshoe crab data.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear
sort width
collapse (sum) satell (count) n, by(width)
glm satell width, fam(poi) lnoff(n) nolog
Generalized linear models No. of obs = 66
Optimization : ML: Newton-Raphson Residual df = 64
Scale parameter = 1
Deviance = 190.0272196 (1/df) Deviance = 2.969175
Pearson = 174.2737318 (1/df) Pearson = 2.723027
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -199.2592131 AIC = 6.098764
BIC = -78.11068387
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1640451 .0199653 8.22 0.000 .1249137 .2031764
_cons | -3.304757 .5422416 -6.09 0.000 -4.367531 -2.241983
n | (exposure)
------------------------------------------------------------------------------
Section 4.4.2 on Poisson model checking, a simpler approach on page 90.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear
gen a = ceil(width - 23.25) + 1
replace a = 1 if a<=0
replace a = 8 if a >8
sort a
collapse (mean) width (sum) satell n, by(a)
glm satell width, fam(poi) lnoff(n)
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 6
Scale parameter = 1
Deviance = 6.516421366 (1/df) Deviance = 1.08607
Pearson = 6.246497819 (1/df) Pearson = 1.041083
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -26.48068535 AIC = 7.120171
BIC = -5.960227884
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1728976 .0212529 8.14 0.000 .1312427 .2145525
_cons | -3.540176 .5765828 -6.14 0.000 -4.670258 -2.410095
n | (exposure)
------------------------------------------------------------------------------
Section 4.4.3, page 90. Model Residuals.
glm satell width, fam(poi) lnoff(n)
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 6
Scale parameter = 1
Deviance = 6.516421366 (1/df) Deviance = 1.08607
Pearson = 6.246497819 (1/df) Pearson = 1.041083
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -26.48068535 AIC = 7.120171
BIC = -5.960227884
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1728976 .0212529 8.14 0.000 .1312427 .2145525
_cons | -3.540176 .5765828 -6.14 0.000 -4.670258 -2.410095
n | (exposure)
------------------------------------------------------------------------------
predict count
(option mu assumed; predicted mean satell)
predict resid, p
predict h, h
gen aresid = resid/sqrt(1-h)
drop h
list
+---------------------------------------------------------------+
| a width satell n count resid aresid |
|---------------------------------------------------------------|
1. | 1 22.69286 14 14 20.54098 -1.443218 -1.630687 |
2. | 2 23.84286 20 14 25.05952 -1.010702 -1.106694 |
3. | 3 24.775 67 28 58.88382 1.057679 1.224653 |
4. | 4 25.83846 105 39 98.5726 .6473766 .7527656 |
5. | 5 26.79091 63 22 65.55898 -.3160459 -.3391854 |
|---------------------------------------------------------------|
6. | 6 27.7375 93 24 84.2362 .9548676 1.057612 |
7. | 7 28.66667 71 18 74.18733 -.3700516 -.4229863 |
8. | 8 30.40714 72 14 77.96057 -.6750725 -1.00809 |
+---------------------------------------------------------------+
Section 4.4.4, page 92-93. Overdispersion in Poisson regression
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear
gen a = ceil(width - 23.25) + 1
replace a = 1 if a<=0
replace a = 8 if a >8
sort a
egen sd = sd(satell), by(a)
gen var=sd*sd
collapse (mean) width satell var (sum) tsat =satell (count) n, by(a)
list
+-----------------------------------------------+
| a width satell var tsat n |
|-----------------------------------------------|
1. | 1 22.69286 1 2.769231 14 14 |
2. | 2 23.84286 1.42857 8.879121 20 14 |
3. | 3 24.775 2.39286 6.543651 67 28 |
4. | 4 25.83846 2.69231 11.37652 105 39 |
5. | 5 26.79091 2.86364 6.885281 63 22 |
|-----------------------------------------------|
6. | 6 27.7375 3.875 8.809782 93 24 |
7. | 7 28.66667 3.94444 16.87909 71 18 |
8. | 8 30.40714 5.14286 8.285714 72 14 |
+-----------------------------------------------+
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear
sort width
collapse (sum) satell (count) n, by(width)
glm satell width, fam(poi) lnoff(n)
Generalized linear models No. of obs = 66
Optimization : ML: Newton-Raphson Residual df = 64
Scale parameter = 1
Deviance = 190.0272196 (1/df) Deviance = 2.969175
Pearson = 174.2737318 (1/df) Pearson = 2.723027
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -199.2592131 AIC = 6.098764
BIC = -78.11068387
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1640451 .0199653 8.22 0.000 .1249137 .2031764
_cons | -3.304757 .5422416 -6.09 0.000 -4.367531 -2.241983
n | (exposure)
------------------------------------------------------------------------------
di sqrt(2.723027)
1.6501597
glm satell width, fam(poi) lnoff(n) scale(x2)
Generalized linear models No. of obs = 66
Optimization : ML: Newton-Raphson Residual df = 64
Scale parameter = 1
Deviance = 190.0272196 (1/df) Deviance = 2.969175
Pearson = 174.2737318 (1/df) Pearson = 2.723027
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -199.2592131 AIC = 6.098764
BIC = -78.11068387
------------------------------------------------------------------------------
satell | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
width | .1640451 .032946 4.98 0.000 .0994721 .2286181
_cons | -3.304757 .8947852 -3.69 0.000 -5.058504 -1.55101
n | (exposure)
------------------------------------------------------------------------------
(Standard errors scaled using square root of Pearson X2-based dispersion)
di 4.98^2
24.8004




