Table 2.1, page 17.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear
list
+--------------------------+
| gender aftlife freq |
|--------------------------|
1. | females yes 435 |
2. | females no 147 |
3. | male yes 375 |
4. | male no 134 |
+--------------------------+
Notice that both variables gender and aftlife are numeric variables. They have value labels. We can also do:
list, nolab
+-------------------------+
| gender aftlife freq |
|-------------------------|
1. | 1 1 435 |
2. | 1 0 147 |
3. | 0 1 375 |
4. | 0 0 134 |
+-------------------------+
Table 2.2, page 18.
tab gender aftlife [fweight=freq]
| belief in afterlife
gender | no yes | Total
-----------+----------------------+----------
male | 134 375 | 509
females | 147 435 | 582
-----------+----------------------+----------
Total | 281 810 | 1,091
Calculation in Section 2.1.2, page 18.
tab gender aftlife [fweight=freq], cell row
+-----------------+
| Key |
|-----------------|
| frequency |
| row percentage |
| cell percentage |
+-----------------+
| belief in afterlife
gender | no yes | Total
-----------+----------------------+----------
male | 134 375 | 509
| 26.33 73.67 | 100.00
| 12.28 34.37 | 46.65
-----------+----------------------+----------
females | 147 435 | 582
| 25.26 74.74 | 100.00
| 13.47 39.87 | 53.35
-----------+----------------------+----------
Total | 281 810 | 1,091
| 25.76 74.24 | 100.00
| 25.76 74.24 | 100.00
Calculation on difference of proportions in Section 2.2.1, page 20. The Stata command cs is part of epitab for creating tables for epidemiologists and you can do help epitab for more information on it. It is used mostly for case-control studies.
cs aftlife gender [fweight=freq]
| gender |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 435 375 | 810
Noncases | 147 134 | 281
-----------------+------------------------+----------
Total | 582 509 | 1091
| |
Risk | .7474227 .7367387 | .7424381
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .010684 | -.0413721 .0627401
Risk ratio | 1.014502 | .9457309 1.088273
Attr. frac. ex. | .0142944 | -.0573833 .0811133
Attr. frac. pop | .0076766 |
+-----------------------------------------------
chi2(1) = 0.16 Pr>chi2 = 0.6872
Section 2.2.2, Table 2.3 and calculation on page 20 and 21 including relative risk.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/aspirin, clear
tab group mi [fweight=count]
| mi
group | no yes | Total
-----------+----------------------+----------
aspirin | 10,933 104 | 11,037
placebo | 10,845 189 | 11,034
-----------+----------------------+----------
Total | 21,778 293 | 22,071
cs mi group [fweight=count]
| group |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 189 104 | 293
Noncases | 10845 10933 | 21778
-----------------+------------------------+----------
Total | 11034 11037 | 22071
| |
Risk | .0171289 .0094229 | .0132753
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .007706 | .0046878 .0107243
Risk ratio | 1.817802 | 1.433031 2.305884
Attr. frac. ex. | .449885 | .3021783 .5663269
Attr. frac. pop | .2901989 |
+-----------------------------------------------
chi2(1) = 25.01 Pr>chi2 = 0.0000
Section 2.3.2 and Section 2.3.3, page 23-25. Odds Ratio for Aspirin Study.
logit mi group [fweight=count], or
Logistic regression Number of obs = 22071
LR chi2(1) = 25.37
Prob > chi2 = 0.0000
Log likelihood = -1544.6617 Pseudo R2 = 0.0081
------------------------------------------------------------------------------
mi | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group | 1.832054 .2250524 4.93 0.000 1.440042 2.33078
------------------------------------------------------------------------------
logit mi group [fweight=count]
Iteration 0: log likelihood = -1557.3477
Iteration 1: log likelihood = -1544.9244
Iteration 2: log likelihood = -1544.6619
Iteration 3: log likelihood = -1544.6617
Logit estimates Number of obs = 22071
LR chi2(1) = 25.37
Prob > chi2 = 0.0000
Log likelihood = -1544.6617 Pseudo R2 = 0.0081
------------------------------------------------------------------------------
mi | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group | .6054377 .1228416 4.93 0.000 .3646726 .8462028
_cons | -4.65515 .0985233 -47.25 0.000 -4.848252 -4.462048
------------------------------------------------------------------------------
Table 2.4 and calculations on page 26.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/table2_4, clear
tab smoke mi [fw=count], col
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
| mi
smoke | control MI | Total
-----------+----------------------+----------
no | 346 90 | 436
| 66.67 34.35 | 55.83
-----------+----------------------+----------
yes | 173 172 | 345
| 33.33 65.65 | 44.17
-----------+----------------------+----------
Total | 519 262 | 781
| 100.00 100.00 | 100.00
logit mi smoke [fw=count], or
Iteration 0: log likelihood = -498.26482
Iteration 1: log likelihood = -461.5178
Iteration 2: log likelihood = -461.1358
Iteration 3: log likelihood = -461.13566
Logit estimates Number of obs = 781
LR chi2(1) = 74.26
Prob > chi2 = 0.0000
Log likelihood = -461.13566 Pseudo R2 = 0.0745
------------------------------------------------------------------------------
mi | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | 3.822222 .6115027 8.38 0.000 2.793415 5.229936
------------------------------------------------------------------------------
Table 2.5 and calculations on page 31.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/party, clear
tab male party [fw=count], expected
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
+--------------------+
| party
male | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 279 73 225 | 577
| 261.4 70.7 244.9 | 577.0
-----------+---------------------------------+----------
1 | 165 47 191 | 403
| 182.6 49.3 171.1 | 403.0
-----------+---------------------------------+----------
Total | 444 120 416 | 980
| 444.0 120.0 416.0 | 980.0
tab male party [fw=count], chi2 lrchi2
| party
male | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 279 73 225 | 577
1 | 165 47 191 | 403
-----------+---------------------------------+----------
Total | 444 120 416 | 980
Pearson chi2(2) = 7.0095 Pr = 0.030
likelihood-ratio chi2(2) = 7.0026 Pr = 0.030
Table 2.6 and calculation on page 32. Stata’s tabulate command does not produce adjusted residuals. Nicholas J. Cox has written a module for tabulation and chi-square tasks. You can download it from the internet by typing search tabchi (see How can I use the search command to search for programs and get additional help? for more information about using search).
tabchi male party [fw=count], a
observed frequency
expected frequency
adjusted residual
-------------------------------------
| party
male | 1 2 3
----------+--------------------------
0 | 279 73 225
| 261.416 70.653 244.931
| 2.293 0.465 -2.618
|
1 | 165 47 191
| 182.584 49.347 171.069
| -2.293 -0.465 2.618
-------------------------------------
Pearson chi2(2) = 7.0095 Pr = 0.030
likelihood-ratio chi2(2) = 7.0026 Pr = 0.030
gen p1=party-1
cc p1 male [fw=count] if party ~=2
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 191 225 | 416 0.4591
Controls | 165 279 | 444 0.3716
-----------------+------------------------+----------------------
Total | 356 504 | 860 0.4140
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 1.435394 | 1.082895 1.902763 (exact)
Attr. frac. ex. | .3033271 | .0765493 .4744484 (exact)
Attr. frac. pop | .139268 |
+-----------------------------------------------
chi2(1) = 6.78 Pr>chi2 = 0.0092
Section 2.4.6, page 33. Partitioning Chi-squared.
tab male party [fw=count] if party~=3, lrchi2
| party
male | 1 2 | Total
-----------+----------------------+----------
0 | 279 73 | 352
1 | 165 47 | 212
-----------+----------------------+----------
Total | 444 120 | 564
likelihood-ratio chi2(1) = 0.1612 Pr = 0.688
gen np = 0
replace np = 1 if party ==3
preserve
collapse (sum) count, by(male np)
list
+-------------------+
| male np count |
|-------------------|
1. | 0 0 352 |
2. | 0 1 225 |
3. | 1 0 212 |
4. | 1 1 191 |
+-------------------+
tab male np [fw=count] ,lrchi2
| np
male | 0 1 | Total
-----------+----------------------+----------
0 | 352 225 | 577
1 | 212 191 | 403
-----------+----------------------+----------
Total | 564 416 | 980
likelihood-ratio chi2(1) = 6.8414 Pr = 0.009
restore
tab male party [fw=count] ,lrchi2
| party
male | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 279 73 225 | 577
1 | 165 47 191 | 403
-----------+---------------------------------+----------
Total | 444 120 416 | 980
likelihood-ratio chi2(2) = 7.0026 Pr = 0.030
Section 2.5.2, Table 2.7 on page 35 and calculation on page 36.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/alcohol, clear
tab alcohol mal [fw=count], r
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
| mal
alcohol | no yes | Total
-----------+----------------------+----------
0 | 17,066 48 | 17,114
| 99.72 0.28 | 100.00
-----------+----------------------+----------
<1 | 14,464 38 | 14,502
| 99.74 0.26 | 100.00
-----------+----------------------+----------
1-2 | 788 5 | 793
| 99.37 0.63 | 100.00
-----------+----------------------+----------
3-5 | 126 1 | 127
| 99.21 0.79 | 100.00
-----------+----------------------+----------
>=6 | 37 1 | 38
| 97.37 2.63 | 100.00
-----------+----------------------+----------
Total | 32,481 93 | 32,574
| 99.71 0.29 | 100.00
tabchi alcohol mal [fw=count], a
observed frequency
expected frequency
adjusted residual
--------------------------------
| mal
alcohol | no yes
----------+---------------------
0 | 17066 48
| 17065.139 48.861
| 0.179 -0.179
|
<1 | 14464 38
| 14460.596 41.404
| 0.711 -0.711
|
1-2 | 788 5
| 790.736 2.264
| -1.843 1.843
|
3-5 | 126 1
| 126.637 0.363
| -1.062 1.062
|
>=6 | 37 1
| 37.892 0.108
| -2.712 2.712
--------------------------------
3 cells with expected frequency < 5
2 cells with expected frequency < 1
Pearson chi2(4) = 12.0821 Pr = 0.017
likelihood-ratio chi2(4) = 6.2020 Pr = 0.185
For the M-squared statistics, we can manually compute it as follows.
recode alcohol 0 = 0 1 = .5 2 = 1.5 3 = 4 4 = 7, gen(ascore)
corr ascore mal [fw=count]
(obs=32574)
| ascore mal
-------------+------------------
ascore | 1.0000
mal | 0.0142 1.0000
di r(N)*r(rho)^2
6.5701339
Calculation on page 37 in Section 2.5.4.
corr alcohol mal [fw=count]
(obs=32574)
| alcohol mal
-------------+------------------
alcohol | 1.0000
mal | 0.0075 1.0000
di r(N)*r(rho)^2
1.8278158
expand count
(32564 observations created)
egen mrank = rank(alcohol)
corr mrank mal
(obs=32574)
| mrank mal
-------------+------------------
mrank | 1.0000
mal | 0.0033 1.0000
di r(N)*r(rho)^2
.35143832
Section 2.6.2, page 41. Fisher’s Tea Taster.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/fisher_tea, clear
tab pour guess [fw=count] , exact all
| guess
pour | milk tea | Total
-----------+----------------------+----------
milk | 3 1 | 4
tea | 1 3 | 4
-----------+----------------------+----------
Total | 4 4 | 8
Pearson chi2(1) = 2.0000 Pr = 0.157
likelihood-ratio chi2(1) = 2.0930 Pr = 0.148
Cramer's V = 0.5000
gamma = 0.8000 ASE = 0.294
Kendall's tau-b = 0.5000 ASE = 0.306
Fisher's exact = 0.486
1-sided Fisher's exact = 0.243
Table 2.9 on page 41. You need to download a module called _GHYPER by Nick Cox and then you can use the egen command to generate the hypergeometric probabilities (see How can I use the search command to search for programs and get additional help? for more information about using search).
clear
set obs 5
obs was 0, now 5
gen n = _n -1
egen prob = hyper(n 4 4 8)
list
+---------------+
| n prob |
|---------------|
1. | 0 .01428571 |
2. | 1 .22857143 |
3. | 2 .51428571 |
4. | 3 .22857143 |
5. | 4 .01428571 |
+---------------+
gen a = sum(prob)
gen pvalue = 1
replace pvalue = 1 - a[_n-1] if _n>=2
list
+-------------------------------------+
| n prob a pvalue |
|-------------------------------------|
1. | 0 .01428571 .0142857 1 |
2. | 1 .22857143 .2428571 .9857143 |
3. | 2 .51428571 .7571428 .7571428 |
4. | 3 .22857143 .9857143 .2428572 |
5. | 4 .01428571 1 .0142857 |
+-------------------------------------+
drop a
gen chi2 = ( (n-2)^2 + (4-n -2)^2 + (4-n-2)^2 + (n-2)^2 ) /2
list
+---------------------------------+
| n prob pvalue chi2 |
|---------------------------------|
1. | 0 .01428571 1 8 |
2. | 1 .22857143 .9857143 2 |
3. | 2 .51428571 .7571428 0 |
4. | 3 .22857143 .2428572 2 |
5. | 4 .01428571 .0142857 8 |
+---------------------------------+
Figure 2.2 on page 42.
gen y2=0 graph twoway rbar prob y2 chi2, xlabel(0 1 to 8) ytitle(probability)
Section 2.6.4, page 44 using tea-tasting data. The command exactcc can be downloaded from the internet by typing search exactcc in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).
exactcc pour guess [fw=count] , exact
| guess | Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 3 1 | 4 0.7500
Controls | 1 3 | 4 0.2500
-----------------+------------------------+----------------------
Total | 4 4 | 8 0.5000
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
| | Cornfield's limits
Odds ratio | 9 | .1938699 . Adjusted
| | .4800561 . Unadjusted
| | Exact limits
| | .2117353 626.24
| | Cornfield's limits
Attr. frac. ex. | .8888889 | -4.158098 . Adjusted
| | -1.08309 . Unadjusted
| | Exact limits
| | -3.722879 .9984032
Attr. frac. pop | .6666667 |
+-----------------------------------------------
chi2(1) = 2.00 Pr>chi2 = 0.1573
Yates' adjusted chi2(1) = 0.50 Pr>chi2 = 0.4795
1-sided Fisher's exact P = 0.2429
2-sided Fisher's exact P = 0.4857
2 times 1-sided Fisher's exact P = 0.4857

