An Introduction to Categorical Analysis by Alan Agresti Chapter 2: Two-Way Contingency Tables

Table 2.1, page 17.

use  https://stats.idre.ucla.edu/stat/stata/examples/icda/afterlife, clear

list

     +--------------------------+
     |  gender   aftlife   freq |
     |--------------------------|
  1. | females       yes    435 |
  2. | females        no    147 |
  3. |    male       yes    375 |
  4. |    male        no    134 |
     +--------------------------+

Notice that both variables gender and aftlife are numeric variables. They have value labels. We can also do:

list, nolab

     +-------------------------+
     | gender   aftlife   freq |
     |-------------------------|
  1. |      1         1    435 |
  2. |      1         0    147 |
  3. |      0         1    375 |
  4. |      0         0    134 |
     +-------------------------+

Table 2.2, page 18.

tab gender aftlife [fweight=freq]

           |  belief in afterlife
    gender |        no        yes |     Total
-----------+----------------------+----------
      male |       134        375 |       509 
   females |       147        435 |       582 
-----------+----------------------+----------
     Total |       281        810 |     1,091

Calculation in Section 2.1.2, page 18.

tab gender aftlife [fweight=freq], cell row 

+-----------------+
| Key             |
|-----------------|
|    frequency    |
| row percentage  |
| cell percentage |
+-----------------+
           |  belief in afterlife
    gender |        no        yes |     Total
-----------+----------------------+----------
      male |       134        375 |       509 
           |     26.33      73.67 |    100.00 
           |     12.28      34.37 |     46.65 
-----------+----------------------+----------
   females |       147        435 |       582 
           |     25.26      74.74 |    100.00 
           |     13.47      39.87 |     53.35 
-----------+----------------------+----------
     Total |       281        810 |     1,091 
           |     25.76      74.24 |    100.00 
           |     25.76      74.24 |    100.00

Calculation on difference of proportions in Section 2.2.1, page 20. The Stata command cs is part of epitab for creating tables for epidemiologists and you can do help epitab for more information on it. It is used mostly for case-control studies.

cs aftlife gender  [fweight=freq]

                 | gender                 |
                 |   Exposed   Unexposed  |     Total
-----------------+------------------------+----------
           Cases |       435         375  |       810
        Noncases |       147         134  |       281
-----------------+------------------------+----------
           Total |       582         509  |      1091
                 |                        |
            Risk |  .7474227    .7367387  |  .7424381
                 |                        |
                 |      Point estimate    |  [95% Conf. Interval]
                 |------------------------+----------------------
 Risk difference |          .010684       | -.0413721    .0627401  
      Risk ratio |         1.014502       |  .9457309    1.088273  
 Attr. frac. ex. |         .0142944       | -.0573833    .0811133  
 Attr. frac. pop |         .0076766       |
                 +-----------------------------------------------
                             chi2(1) =     0.16  Pr>chi2 = 0.6872

Section 2.2.2, Table 2.3 and calculation on page 20 and 21 including relative risk.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/aspirin, clear

tab group mi [fweight=count]

           |          mi
     group |        no        yes |     Total
-----------+----------------------+----------
   aspirin |    10,933        104 |    11,037 
   placebo |    10,845        189 |    11,034 
-----------+----------------------+----------
     Total |    21,778        293 |    22,071
     
cs mi group [fweight=count]

                 | group                  |
                 |   Exposed   Unexposed  |     Total
-----------------+------------------------+----------
           Cases |       189         104  |       293
        Noncases |     10845       10933  |     21778
-----------------+------------------------+----------
           Total |     11034       11037  |     22071
                 |                        |
            Risk |  .0171289    .0094229  |  .0132753
                 |                        |
                 |      Point estimate    |  [95% Conf. Interval]
                 |------------------------+----------------------
 Risk difference |          .007706       |  .0046878    .0107243  
      Risk ratio |         1.817802       |  1.433031    2.305884  
 Attr. frac. ex. |          .449885       |  .3021783    .5663269  
 Attr. frac. pop |         .2901989       |
                 +-----------------------------------------------
                             chi2(1) =    25.01  Pr>chi2 = 0.0000

Section 2.3.2 and Section 2.3.3, page 23-25. Odds Ratio for Aspirin Study.

logit mi group [fweight=count], or

Logistic regression                               Number of obs   =      22071
                                                  LR chi2(1)      =      25.37
                                                  Prob > chi2     =     0.0000
Log likelihood = -1544.6617                       Pseudo R2       =     0.0081
------------------------------------------------------------------------------
          mi | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |   1.832054   .2250524     4.93   0.000     1.440042     2.33078
------------------------------------------------------------------------------

logit mi group [fweight=count]

Iteration 0:   log likelihood = -1557.3477
Iteration 1:   log likelihood = -1544.9244
Iteration 2:   log likelihood = -1544.6619
Iteration 3:   log likelihood = -1544.6617
Logit estimates                                   Number of obs   =      22071
                                                  LR chi2(1)      =      25.37
                                                  Prob > chi2     =     0.0000
Log likelihood = -1544.6617                       Pseudo R2       =     0.0081
------------------------------------------------------------------------------
          mi |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |   .6054377   .1228416     4.93   0.000     .3646726    .8462028
       _cons |   -4.65515   .0985233   -47.25   0.000    -4.848252   -4.462048
------------------------------------------------------------------------------

Table 2.4 and calculations on page 26.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/table2_4, clear

tab smoke mi [fw=count], col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+
           |          mi
     smoke |   control         MI |     Total
-----------+----------------------+----------
        no |       346         90 |       436 
           |     66.67      34.35 |     55.83 
-----------+----------------------+----------
       yes |       173        172 |       345 
           |     33.33      65.65 |     44.17 
-----------+----------------------+----------
     Total |       519        262 |       781 
           |    100.00     100.00 |    100.00 
           
logit mi smoke [fw=count], or

Iteration 0:   log likelihood = -498.26482
Iteration 1:   log likelihood =  -461.5178
Iteration 2:   log likelihood =  -461.1358
Iteration 3:   log likelihood = -461.13566
Logit estimates                                   Number of obs   =        781
                                                  LR chi2(1)      =      74.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -461.13566                       Pseudo R2       =     0.0745
------------------------------------------------------------------------------
          mi | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       smoke |   3.822222   .6115027     8.38   0.000     2.793415    5.229936
------------------------------------------------------------------------------

Table 2.5 and calculations on page 31.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/party, clear

tab male party [fw=count], expected

+--------------------+
| Key                |
|--------------------|
|     frequency      |
| expected frequency |
+--------------------+
           |              party
      male |         1          2          3 |     Total
-----------+---------------------------------+----------
         0 |       279         73        225 |       577 
           |     261.4       70.7      244.9 |     577.0 
-----------+---------------------------------+----------
         1 |       165         47        191 |       403 
           |     182.6       49.3      171.1 |     403.0 
-----------+---------------------------------+----------
     Total |       444        120        416 |       980 
           |     444.0      120.0      416.0 |     980.0 
           
tab male party [fw=count], chi2 lrchi2

           |              party
      male |         1          2          3 |     Total
-----------+---------------------------------+----------
         0 |       279         73        225 |       577 
         1 |       165         47        191 |       403 
-----------+---------------------------------+----------
     Total |       444        120        416 |       980
          Pearson chi2(2) =   7.0095   Pr = 0.030
 likelihood-ratio chi2(2) =   7.0026   Pr = 0.030

Table 2.6 and calculation on page 32. Stata’s tabulate command does not produce adjusted residuals. Nicholas J. Cox has written a module for tabulation and chi-square tasks. You can download it from the internet by typing search tabchi (see How can I use the search command to search for programs and get additional help? for more information about using search).

tabchi male party [fw=count], a

          observed frequency
          expected frequency
          adjusted residual
-------------------------------------
          |           party          
     male |       1        2        3
----------+--------------------------
        0 |     279       73      225
          | 261.416   70.653  244.931
          |   2.293    0.465   -2.618
          | 
        1 |     165       47      191
          | 182.584   49.347  171.069
          |  -2.293   -0.465    2.618
-------------------------------------
          Pearson chi2(2) =   7.0095   Pr = 0.030
 likelihood-ratio chi2(2) =   7.0026   Pr = 0.030
 
gen p1=party-1
cc p1 male [fw=count] if party ~=2

                                                        Proportion
                 |   Exposed   Unexposed  |     Total     Exposed
-----------------+------------------------+----------------------
           Cases |       191         225  |       416      0.4591
        Controls |       165         279  |       444      0.3716
-----------------+------------------------+----------------------
           Total |       356         504  |       860      0.4140
                 |                        |
                 |      Point estimate    |  [95% Conf. Interval]
                 |------------------------+----------------------
      Odds ratio |         1.435394       |  1.082895    1.902763  (exact)
 Attr. frac. ex. |         .3033271       |  .0765493    .4744484  (exact)
 Attr. frac. pop |          .139268       |
                 +-----------------------------------------------
                             chi2(1) =     6.78  Pr>chi2 = 0.0092

Section 2.4.6, page 33. Partitioning Chi-squared.

tab male party [fw=count] if party~=3, lrchi2

           |         party
      male |         1          2 |     Total
-----------+----------------------+----------
         0 |       279         73 |       352 
         1 |       165         47 |       212 
-----------+----------------------+----------
     Total |       444        120 |       564 
 likelihood-ratio chi2(1) =   0.1612   Pr = 0.688
 
gen np = 0
replace np = 1 if party ==3
preserve
collapse (sum) count, by(male np)
list

     +-------------------+
     | male   np   count |
     |-------------------|
  1. |    0    0     352 |
  2. |    0    1     225 |
  3. |    1    0     212 |
  4. |    1    1     191 |
     +-------------------+

tab male np [fw=count] ,lrchi2
           |          np
      male |         0          1 |     Total
-----------+----------------------+----------
         0 |       352        225 |       577 
         1 |       212        191 |       403 
-----------+----------------------+----------
     Total |       564        416 |       980
 likelihood-ratio chi2(1) =   6.8414   Pr = 0.009
 
restore
tab male party [fw=count] ,lrchi2
           |              party
      male |         1          2          3 |     Total
-----------+---------------------------------+----------
         0 |       279         73        225 |       577 
         1 |       165         47        191 |       403 
-----------+---------------------------------+----------
     Total |       444        120        416 |       980 
 likelihood-ratio chi2(2) =   7.0026   Pr = 0.030

Section 2.5.2, Table 2.7 on page 35 and calculation on page 36.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/alcohol, clear

tab alcohol mal [fw=count], r

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+
           |          mal
   alcohol |        no        yes |     Total
-----------+----------------------+----------
         0 |    17,066         48 |    17,114 
           |     99.72       0.28 |    100.00 
-----------+----------------------+----------
        <1 |    14,464         38 |    14,502 
           |     99.74       0.26 |    100.00 
-----------+----------------------+----------
       1-2 |       788          5 |       793 
           |     99.37       0.63 |    100.00 
-----------+----------------------+----------
       3-5 |       126          1 |       127 
           |     99.21       0.79 |    100.00 
-----------+----------------------+----------
       >=6 |        37          1 |        38 
           |     97.37       2.63 |    100.00 
-----------+----------------------+----------
     Total |    32,481         93 |    32,574 
           |     99.71       0.29 |    100.00 
           
tabchi alcohol mal [fw=count], a 

          observed frequency
          expected frequency
          adjusted residual
--------------------------------
          |         mal         
  alcohol |        no        yes
----------+---------------------
        0 |     17066         48
          | 17065.139     48.861
          |     0.179     -0.179
          | 
       <1 |     14464         38
          | 14460.596     41.404
          |     0.711     -0.711
          | 
      1-2 |       788          5
          |   790.736      2.264
          |    -1.843      1.843
          | 
      3-5 |       126          1
          |   126.637      0.363
          |    -1.062      1.062
          | 
      >=6 |        37          1
          |    37.892      0.108
          |    -2.712      2.712
--------------------------------
3 cells with expected frequency < 5
2 cells with expected frequency < 1
          Pearson chi2(4) =  12.0821   Pr = 0.017
 likelihood-ratio chi2(4) =   6.2020   Pr = 0.185

For the M-squared statistics, we can manually compute it as follows.

recode alcohol 0 = 0 1 = .5 2 = 1.5 3 = 4 4 = 7, gen(ascore)
corr ascore mal [fw=count]
(obs=32574)

             |   ascore      mal
-------------+------------------
      ascore |   1.0000
         mal |   0.0142   1.0000

di r(N)*r(rho)^2
6.5701339

Calculation on page 37 in Section 2.5.4.

corr alcohol mal [fw=count]
(obs=32574)

             |  alcohol      mal
-------------+------------------
     alcohol |   1.0000
         mal |   0.0075   1.0000

di r(N)*r(rho)^2
1.8278158

expand count
(32564 observations created)

egen mrank = rank(alcohol)
corr mrank mal
(obs=32574)

             |    mrank      mal
-------------+------------------
       mrank |   1.0000
         mal |   0.0033   1.0000

di r(N)*r(rho)^2
.35143832

Section 2.6.2, page 41. Fisher’s Tea Taster.

use https://stats.idre.ucla.edu/stat/stata/examples/icda/fisher_tea, clear

tab pour guess [fw=count] , exact all

           |         guess
      pour |      milk        tea |     Total
-----------+----------------------+----------
      milk |         3          1 |         4 
       tea |         1          3 |         4 
-----------+----------------------+----------
     Total |         4          4 |         8 
          Pearson chi2(1) =   2.0000   Pr = 0.157
 likelihood-ratio chi2(1) =   2.0930   Pr = 0.148
               Cramer's V =   0.5000
                    gamma =   0.8000  ASE = 0.294
          Kendall's tau-b =   0.5000  ASE = 0.306
           Fisher's exact =                 0.486
   1-sided Fisher's exact =                 0.243

Table 2.9 on page 41. You need to download a module called _GHYPER by Nick Cox and then you can use the egen command to generate the hypergeometric probabilities (see How can I use the search command to search for programs and get additional help? for more information about using search).

clear
set obs 5
obs was 0, now 5

gen n = _n -1
egen prob = hyper(n 4 4 8)
list

     +---------------+
     | n        prob |
     |---------------|
  1. | 0   .01428571 |
  2. | 1   .22857143 |
  3. | 2   .51428571 |
  4. | 3   .22857143 |
  5. | 4   .01428571 |
     +---------------+
     
gen a = sum(prob)
gen pvalue = 1
replace pvalue = 1 - a[_n-1] if _n>=2
list

     +-------------------------------------+
     | n        prob          a     pvalue |
     |-------------------------------------|
  1. | 0   .01428571   .0142857          1 |
  2. | 1   .22857143   .2428571   .9857143 |
  3. | 2   .51428571   .7571428   .7571428 |
  4. | 3   .22857143   .9857143   .2428572 |
  5. | 4   .01428571          1   .0142857 |
     +-------------------------------------+
     
drop a
gen chi2 = ( (n-2)^2 + (4-n -2)^2 + (4-n-2)^2 + (n-2)^2 ) /2
list

     +---------------------------------+
     | n        prob     pvalue   chi2 |
     |---------------------------------|
  1. | 0   .01428571          1      8 |
  2. | 1   .22857143   .9857143      2 |
  3. | 2   .51428571   .7571428      0 |
  4. | 3   .22857143   .2428572      2 |
  5. | 4   .01428571   .0142857      8 |
     +---------------------------------+

Figure 2.2 on page 42.

gen y2=0
graph twoway rbar prob y2 chi2, xlabel(0 1 to 8) ytitle(probability)

Section 2.6.4, page 44 using tea-tasting data. The command exactcc can be downloaded from the internet by typing search exactcc in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).

exactcc pour guess [fw=count] , exact

                 | guess                  |             Proportion
                 |   Exposed   Unexposed  |     Total     Exposed
-----------------+------------------------+----------------------
           Cases |         3           1  |         4      0.7500
        Controls |         1           3  |         4      0.2500
-----------------+------------------------+----------------------
           Total |         4           4  |         8      0.5000
                 |                        |
                 |      Point estimate    |  [95% Conf. Interval]
                 |------------------------+----------------------
                 |                        | Cornfield's limits
      Odds ratio |                9       |  .1938699           .  Adjusted
                 |                        |  .4800561           .  Unadjusted
                 |                        | Exact limits
                 |                        |  .2117353      626.24
                 |                        | Cornfield's limits
 Attr. frac. ex. |         .8888889       | -4.158098           .  Adjusted
                 |                        |  -1.08309           .  Unadjusted
                 |                        | Exact limits
                 |                        | -3.722879    .9984032
 Attr. frac. pop |         .6666667       |
                 +-----------------------------------------------
                             chi2(1) =     2.00  Pr>chi2 = 0.1573
             Yates' adjusted chi2(1) =     0.50  Pr>chi2 = 0.4795
                                1-sided Fisher's exact P = 0.2429
                                2-sided Fisher's exact P = 0.4857
                        2 times 1-sided Fisher's exact P = 0.4857