Methods Matter: Improving causal Inference in Educational and Social Science Research by Richard J. Murnane and John B. Willett Chapter 12: Dealing with Bias in Treatment Effects Estimated from Nonexperimental Data

use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic, clear

Descriptive statistics for mathematics score (math12) and type of high school (catholic). Note: this output does not appear in the text.

sum math12 catholic, detail

          12th grade standardized mathematics score
-------------------------------------------------------------
      Percentiles      Smallest
 1%        32.88          29.88
 5%        35.46          30.14
10%        37.54          30.42       Obs                5671
25%        43.53          30.55       Sum of Wgt.        5671

50%        51.33                      Mean           51.05124
                        Largest       Std. Dev.      9.502415
75%        58.61          70.94
90%        63.67          71.08       Variance        90.2959
95%        65.98          71.12       Skewness      -.0567201
99%        69.33          71.37       Kurtosis       2.072073

                    attended catholic hs?
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs                5671
25%            0              0       Sum of Wgt.        5671

50%            0                      Mean           .1043908
                        Largest       Std. Dev.      .3057938
75%            0              1
90%            1              1       Variance       .0935098
95%            1              1       Skewness       2.587653
99%            1              1       Kurtosis        7.69595

table catholic, contents(mean math12 sd math12 freq)

----------------------------------------------------
attended  |
catholic  |
hs?       | mean(math12)    sd(math12)         Freq.
----------+-----------------------------------------
       no |     50.64465      9.534295         5,079
      yes |     54.53951      8.463153           592
----------------------------------------------------

Descriptive statistics for family income (faminc8). (Not shown in text.)

sum faminc8, detail

           total annual family income in 8th grade
-------------------------------------------------------------
      Percentiles      Smallest
 1%            2              1
 5%            5              1
10%            7              1       Obs                5671
25%            8              1       Sum of Wgt.        5671

50%           10                      Mean           9.526186
                        Largest       Std. Dev.      2.217688
75%           11             12
90%           12             12       Variance       4.918141
95%           12             12       Skewness      -1.268464
99%           12             12       Kurtosis       4.447905

Various methods of examining the relationship between catholic and faminc8. (Not shown in text.)

by catholic, sort: sum faminc8, detail

------------------------------------------------------------------------------------------------------
-> catholic = no

           total annual family income in 8th grade
-------------------------------------------------------------
      Percentiles      Smallest
 1%            2              1
 5%            5              1
10%            6              1       Obs                5079
25%            8              1       Sum of Wgt.        5079

50%           10                      Mean           9.428825
                        Largest       Std. Dev.       2.25239
75%           11             12
90%           12             12       Variance       5.073261
95%           12             12       Skewness      -1.214205
99%           12             12       Kurtosis       4.255522

------------------------------------------------------------------------------------------------------
-> catholic = yes

           total annual family income in 8th grade
-------------------------------------------------------------
      Percentiles      Smallest
 1%            4              1
 5%            7              2
10%            8              4       Obs                 592
25%           10              4       Sum of Wgt.         592

50%           11                      Mean           10.36149
                        Largest       Std. Dev.       1.67728
75%           11             12
90%           12             12       Variance       2.813269
95%           12             12       Skewness      -1.784059
99%           12             12       Kurtosis       7.343344


tab faminc8 catholic, chi2

   total annual |
  family income | attended catholic hs?
   in 8th grade |        no        yes |     Total
----------------+----------------------+----------
           none |        17          1 |        18 
         <$1000 |        41          1 |        42 
    $1000-$2999 |        84          0 |        84 
    $3000-$4999 |        79          6 |        85 
    $5000-$7499 |       138          6 |       144 
     7500-$9999 |       169          6 |       175 
  $10000-$14999 |       427         20 |       447 
  $15000-$19999 |       410         31 |       441 
  $20000-$24999 |       608         47 |       655 
  $25000-$34999 |     1,137        130 |     1,267 
   35000-$49999 |     1,221        198 |     1,419 
   50000-$74999 |       748        146 |       894 
----------------+----------------------+----------
          Total |     5,079        592 |     5,671 

         Pearson chi2(11) = 111.4057   Pr = 0.000

pwcorr faminc8 catholic, sig

             |  faminc8 catholic
-------------+------------------
     faminc8 |   1.0000 
             |
             |
    catholic |   0.1286   1.0000 
             |   0.0000
             |

Categorize faminc8 into catfaminc8, and examine the relationship between the two variables. (Not shown in text.)

egen catfaminc8=cut(faminc8), at(1,9,11,13) icodes

tab catfaminc8

 catfaminc8 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,436       25.32       25.32
          1 |      1,922       33.89       59.21
          2 |      2,313       40.79      100.00
------------+-----------------------------------
      Total |      5,671      100.00

tab faminc8 catfaminc8

   total annual |
  family income |            catfaminc8
   in 8th grade |         0          1          2 |     Total
----------------+---------------------------------+----------
           none |        18          0          0 |        18 
         <$1000 |        42          0          0 |        42 
    $1000-$2999 |        84          0          0 |        84 
    $3000-$4999 |        85          0          0 |        85 
    $5000-$7499 |       144          0          0 |       144 
     7500-$9999 |       175          0          0 |       175 
  $10000-$14999 |       447          0          0 |       447 
  $15000-$19999 |       441          0          0 |       441 
  $20000-$24999 |         0        655          0 |       655 
  $25000-$34999 |         0      1,267          0 |     1,267 
   35000-$49999 |         0          0      1,419 |     1,419 
   50000-$74999 |         0          0        894 |       894 
----------------+---------------------------------+----------
          Total |     1,436      1,922      2,313 |     5,671

Table 12.1 on page 293.

* Sample variance of faminc8 in each income category.
tabstat faminc8, by(catfaminc8) statistics(var)

Summary for variables: faminc8
     by categories of: catfaminc8 

catfaminc8 |  variance
-----------+----------
         0 |  3.063001
         1 |  .2247694
         2 |  .2372228
-----------+----------
     Total |  4.918141
----------------------

* Sample mean of faminc8 by income category and school type.
table catfaminc8 catholic, contents(mean faminc8)

------------------------------------------
catfaminc |     attended catholic hs?     
8         |             no             yes
----------+-------------------------------
        0 |  6.32967042923  6.774647712708
        1 | 9.651576042175  9.734463691711
        2 | 11.37988853455   11.4244184494
------------------------------------------

* Tests for differences in family income by school type within each income category.
by catfaminc8, sort : ttest faminc8, by(catholic) 

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1365     6.32967    .0475499    1.756773    6.236392    6.422949
     yes |      71    6.774648    .1862445    1.569324    6.403195    7.146101
---------+--------------------------------------------------------------------
combined |    1436    6.351671    .0461845    1.750143    6.261075    6.442268
---------+--------------------------------------------------------------------
    diff |           -.4449776    .2127872               -.8623851   -.0275701
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.0912
Ho: diff = 0                                     degrees of freedom =     1434

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0183         Pr(|T| > |t|) = 0.0367          Pr(T > t) = 0.9817

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1745    9.651576    .0114094    .4766077    9.629198    9.673954
     yes |     177    9.734463    .0332883    .4428714    9.668768    9.800159
---------+--------------------------------------------------------------------
combined |    1922    9.659209    .0108141    .4740985       9.638    9.680418
---------+--------------------------------------------------------------------
    diff |           -.0828873     .037361               -.1561597    -.009615
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.2186
Ho: diff = 0                                     degrees of freedom =     1920

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0133         Pr(|T| > |t|) = 0.0266          Pr(T > t) = 0.9867

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1969    11.37989    .0109408    .4854821    11.35843    11.40135
     yes |     344    11.42442    .0266872    .4949744    11.37193    11.47691
---------+--------------------------------------------------------------------
combined |    2313    11.38651    .0101272    .4870552    11.36665    11.40637
---------+--------------------------------------------------------------------
    diff |           -.0445303     .028453               -.1003264    .0112657
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -1.5650
Ho: diff = 0                                     degrees of freedom =     2311

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0589         Pr(|T| > |t|) = 0.1177          Pr(T > t) = 0.9411


tab catfaminc8 catholic, row

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+

           | attended catholic hs?
catfaminc8 |        no        yes |     Total
-----------+----------------------+----------
         0 |     1,365         71 |     1,436 
           |     95.06       4.94 |    100.00 
-----------+----------------------+----------
         1 |     1,745        177 |     1,922 
           |     90.79       9.21 |    100.00 
-----------+----------------------+----------
         2 |     1,969        344 |     2,313 
           |     85.13      14.87 |    100.00 
-----------+----------------------+----------
     Total |     5,079        592 |     5,671 
           |     89.56      10.44 |    100.00 


* Average math achievement, by school type and income category.
table catfaminc8 catholic, contents(mean math12)

------------------------------
          | attended catholic 
catfaminc |        hs?        
8         |       no       yes
----------+-------------------
        0 | 46.77358  50.53563
        1 | 50.33842  53.85616
        2 | 53.59964   55.7175
------------------------------

* Tests for differences in average math achievement by school type within each income category.
by catfaminc8, sort : ttest math12, by(catholic)

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1365    46.77358    .2409728     8.90296    46.30086     47.2463
     yes |      71    50.53563    1.003933    8.459293    48.53335    52.53792
---------+--------------------------------------------------------------------
combined |    1436    46.95959    .2352876    8.916128    46.49804    47.42113
---------+--------------------------------------------------------------------
    diff |           -3.762051    1.081144               -5.882845   -1.641258
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -3.4797
Ho: diff = 0                                     degrees of freedom =     1434

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0005          Pr(T > t) = 0.9997

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1745    50.33842    .2228944    9.311012    49.90126    50.77559
     yes |     177    53.85616    .6445502    8.575183    52.58412     55.1282
---------+--------------------------------------------------------------------
combined |    1922    50.66238    .2121188    9.299418    50.24637    51.07838
---------+--------------------------------------------------------------------
    diff |           -3.517734    .7293671               -4.948169   -2.087299
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -4.8230
Ho: diff = 0                                     degrees of freedom =     1920

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1969    53.59964    .2060271    9.142124    53.19559    54.00369
     yes |     344     55.7175    .4384348    8.131754    54.85514    56.57986
---------+--------------------------------------------------------------------
combined |    2313    53.91462    .1877359    9.028905    53.54647    54.28277
---------+--------------------------------------------------------------------
    diff |           -2.117861    .5258916               -3.149129   -1.086592
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -4.0272
Ho: diff = 0                                     degrees of freedom =     2311

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0001          Pr(T > t) = 1.0000

Figure 12.1 on page 297.

sort catholic catfaminc8
by catholic catfaminc8: egen n = count(id)
by catholic catfaminc8: egen mmath12 = mean(math12)     

twoway (scatter mmath12 catholic [aweight=n] if catfaminc8==0, connect(l) msymbol(S)) ///
	(scatter mmath12 catholic [aweight=n] if catfaminc8==1, connect(l) msymbol(S)) ///
	(scatter mmath12 catholic [aweight=n] if catfaminc8==2, connect(l) msymbol(S)) ///
	(lfit math12 catholic [aweight=n]),     ///
	xlabel(0 "Public" 1 "Catholic") xscale(range(-.25 1.25)) ///
	legend(label(1 `"faminc8 is "Lo""') label(2 `"faminc8 is "Med""') ///
	label(3 `"faminc8 is "Hi""') label(4 `"Unstratified"')) ///
	xtitle("Type of High School") ytitle("12rh Grade Mathematics Achievement") ///
	scheme(s2mono)

A simplified graph that provides information similar to that in Figure 12.1 can be produced using the syntax shown below. (Not shown in the text.)

twoway  (lfit math12 catholic if catfaminc8==0) ///
	(lfit math12 catholic if catfaminc8==1) ///
	(lfit math12 catholic if catfaminc8==2) ///
	(lfit math12 catholic), ///
	xlabel(0 "Public" 1 "Catholic") xscale(range(-.25 1.25)) ///
	legend(label(1 `"faminc8 is "Lo""') label(2 `"faminc8 is "Med""') ///
	label(3 `"faminc8 is "Hi""') label(4 `"Unstratified"')) ///
	xtitle("Type of High School") ytitle("12rh Grade Mathematics Achievement") ///
	scheme(s2mono)

OLS regression model of math12 on catholic. This regression corresponds to the "Unstratified" line in Figure 12.1. (Not shown in the text.)

regress  math12 catholic

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  1,  5669) =   90.48
       Model |   8043.1077     1   8043.1077           Prob > F      =  0.0000
    Residual |  503934.635  5669  88.8930385           R-squared     =  0.0157
-------------+------------------------------           Adj R-squared =  0.0155
       Total |  511977.743  5670  90.2958982           Root MSE      =  9.4283

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |    3.89486   .4094621     9.51   0.000     3.092157    4.697562
       _cons |   50.64465   .1322954   382.81   0.000      50.3853      50.904
------------------------------------------------------------------------------

OLS regression of math12 on catholic, stratifying by catfaminc8. These regressions correspond to the information shown in Figure 12.1. (Not shown in text.)

by catfaminc8, sort: regress  math12 catholic

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0

      Source |       SS       df       MS              Number of obs =    1436
-------------+------------------------------           F(  1,  1434) =   12.11
       Model |  955.181769     1  955.181769           Prob > F      =  0.0005
    Residual |  113123.499  1434  78.8866802           R-squared     =  0.0084
-------------+------------------------------           Adj R-squared =  0.0077
       Total |  114078.681  1435  79.4973388           Root MSE      =  8.8818

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   3.762051   1.081144     3.48   0.001     1.641258    5.882845
       _cons |   46.77358   .2404006   194.57   0.000     46.30201    47.24516
------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 1

      Source |       SS       df       MS              Number of obs =    1922
-------------+------------------------------           F(  1,  1920) =   23.26
       Model |  1988.57183     1  1988.57183           Prob > F      =  0.0000
    Residual |  164137.924  1920  85.4885019           R-squared     =  0.0120
-------------+------------------------------           Adj R-squared =  0.0115
       Total |  166126.496  1921  86.4791752           Root MSE      =   9.246

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   3.517734   .7293671     4.82   0.000     2.087299    4.948169
       _cons |   50.33842   .2213381   227.43   0.000     49.90434    50.77251
------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 2

      Source |       SS       df       MS              Number of obs =    2313
-------------+------------------------------           F(  1,  2311) =   16.22
       Model |  1313.47946     1  1313.47946           Prob > F      =  0.0001
    Residual |  187163.381  2311  80.9880488           R-squared     =  0.0070
-------------+------------------------------           Adj R-squared =  0.0065
       Total |   188476.86  2312  81.5211333           Root MSE      =  8.9993

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.117861   .5258916     4.03   0.000     1.086592    3.149129
       _cons |   53.59964   .2028092   264.29   0.000     53.20193    53.99735
------------------------------------------------------------------------------

Descriptive statistics for math achievement (math8). (Not shown in text.)

sum math8, detail

          8th grade standardized mathematics score
-------------------------------------------------------------
      Percentiles      Smallest
 1%        35.95          34.48
 5%        37.89          34.49
10%        39.42          34.52       Obs                5671
25%        43.45          34.52       Sum of Wgt.        5671

50%        50.45                      Mean           51.48952
                        Largest       Std. Dev.      9.683425
75%        58.56           77.2
90%        65.39           77.2       Variance       93.76872
95%        68.89           77.2       Skewness       .4078902
99%        74.04           77.2       Kurtosis       2.319295

Several methods of examining the relationship between math8 and catholic. (Not shown in text.)

corr math8 catholic
(obs=5671)

             |    math8 catholic
-------------+------------------
       math8 |   1.0000
    catholic |   0.0765   1.0000


ttest math8, by(catholic)

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    5079    51.23648    .1367773    9.747724    50.96834    51.50462
     yes |     592    53.66039    .3628002     8.82731    52.94785    54.37292
---------+--------------------------------------------------------------------
combined |    5671    51.48952    .1285876    9.683425    51.23743     51.7416
---------+--------------------------------------------------------------------
    diff |           -2.423907    .4193447               -3.245983   -1.601831
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -5.7802
Ho: diff = 0                                     degrees of freedom =     5669

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

Create a categorical variable for prior math achievement (catmath8), and examine the relationship between cathmath8 and math8. (Not shown in text.)

egen catmath8=cut(math8), at(30,38,44,51,80) icodes
tab catmath8

   catmath8 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        304        5.36        5.36
          1 |      1,236       21.80       27.16
          2 |      1,421       25.06       52.21
          3 |      2,710       47.79      100.00
------------+-----------------------------------
      Total |      5,671      100.00


table catmath8, contents(mean math8 sd math8 freq)

-------------------------------------------------
 catmath8 | mean(math8)    sd(math8)        Freq.
----------+--------------------------------------
        0 |    36.78859     .8564365          304
        1 |    41.10199     1.722423        1,236
        2 |    47.53923     2.045117        1,421
        3 |     59.9476      6.27689        2,710
-------------------------------------------------

Check for balance in math8 within strata (catmath8), by catholic. (Not shown in the text.)

table catmath8 catholic, contents(mean math8 sd math8 freq)

------------------------------
          | attended catholic 
          |        hs?        
 catmath8 |       no       yes
----------+-------------------
        0 | 36.80332  36.30556
          | .8559109  .7666504
          |      295         9
          | 
        1 | 41.09058   41.2438
          | 1.718102  1.778788
          |    1,144        92
          | 
        2 | 47.49826  47.92955
          | 2.040288  2.057497
          |    1,286       135
          | 
        3 | 60.01815  59.48112
          | 6.348762  5.765806
          |    2,354       356
------------------------------

by catmath8, sort : ttest math8, by(catholic) 

------------------------------------------------------------------------------------------------------
-> catmath8 = 0

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     295    36.80332    .0498331    .8559109    36.70525     36.9014
     yes |       9    36.30556    .2555501    .7666504    35.71626    36.89486
---------+--------------------------------------------------------------------
combined |     304    36.78859      .04912    .8564365    36.69193    36.88525
---------+--------------------------------------------------------------------
    diff |            .4977666    .2888636               -.0706738    1.066207
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =   1.7232
Ho: diff = 0                                     degrees of freedom =      302

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9571         Pr(|T| > |t|) = 0.0859          Pr(T > t) = 0.0429

------------------------------------------------------------------------------------------------------
-> catmath8 = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1144    41.09059    .0507967    1.718102    40.99092    41.19025
     yes |      92     41.2438    .1854515    1.778788    40.87543    41.61218
---------+--------------------------------------------------------------------
combined |    1236    41.10199    .0489926    1.722423    41.00587    41.19811
---------+--------------------------------------------------------------------
    diff |           -.1532187    .1866807               -.5194654     .213028
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -0.8208
Ho: diff = 0                                     degrees of freedom =     1234

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.2060         Pr(|T| > |t|) = 0.4119          Pr(T > t) = 0.7940

------------------------------------------------------------------------------------------------------
-> catmath8 = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1286    47.49826    .0568946    2.040288    47.38664    47.60987
     yes |     135    47.92956    .1770811    2.057497    47.57932    48.27979
---------+--------------------------------------------------------------------
combined |    1421    47.53923    .0542527    2.045117    47.43281    47.64566
---------+--------------------------------------------------------------------
    diff |           -.4312974    .1847346               -.7936796   -.0689152
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.3347
Ho: diff = 0                                     degrees of freedom =     1419

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0098         Pr(|T| > |t|) = 0.0197          Pr(T > t) = 0.9902

------------------------------------------------------------------------------------------------------
-> catmath8 = 3

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    2354    60.01815    .1308536    6.348762    59.76155    60.27475
     yes |     356    59.48112    .3055871    5.765806    58.88013    60.08211
---------+--------------------------------------------------------------------
combined |    2710     59.9476    .1205757     6.27689    59.71117    60.18403
---------+--------------------------------------------------------------------
    diff |            .5370243    .3568614               -.1627239    1.236773
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =   1.5049
Ho: diff = 0                                     degrees of freedom =     2708

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9338         Pr(|T| > |t|) = 0.1325          Pr(T > t) = 0.0662

Table 12.1 on page 301.

table catmath8 catholic , contents(mean math12 freq) by(catfaminc8)

------------------------------
catfaminc | attended catholic 
8 and     |        hs?        
catmath8  |       no       yes
----------+-------------------
0         |
        0 | 36.80514     42.57
          |      142         1
          | 
        1 | 40.99247   41.7019
          |      433        21
          | 
        2 | 47.12156  48.65308
          |      385        13
          | 
        3 | 56.11869  56.58972
          |      405        36
----------+-------------------
1         |
        0 | 37.94156    39.775
          |       96         2
          | 
        1 | 41.92456  44.56454
          |      390        33
          | 
        2 |  47.9487  50.13551
          |      469        49
          | 
        3 | 57.41727  59.41634
          |      790        93
----------+-------------------
2         |
        0 | 39.78667  40.40334
          |       57         6
          | 
        1 |  42.7458  44.22737
          |      321        38
          | 
        2 | 49.17894  50.70644
          |      432        73
          | 
        3 | 58.93283  59.65723
          |    1,159       227
------------------------------

The t-tests shown in Table 12.1 on page 301 can be reproduced using the following syntax. (Note: most of the output was omitted to save space.)

bysort catfaminc8 catmath8: ttest math12, by(catholic) 

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0, catmath8 = 0

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     142    36.80514    .3391017    4.040863    36.13476    37.47552
     yes |       1       42.57           .           .           .           .
---------+--------------------------------------------------------------------
combined |     143    36.84545           .           .           .           .
---------+--------------------------------------------------------------------
    diff |           -5.764859           .                       .           .
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =        .
Ho: diff = 0                                     degrees of freedom =      141

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) =      .         Pr(|T| > |t|) =      .          Pr(T > t) =      .

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0, catmath8 = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     433    40.99247    .2466134    5.131692    40.50776    41.47718
     yes |      21     41.7019    1.018852    4.668968    39.57662    43.82719
---------+--------------------------------------------------------------------
combined |     454    41.02529    .2397602    5.108636    40.55411    41.49647
---------+--------------------------------------------------------------------
    diff |           -.7094334    1.142284               -2.954279    1.535412
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -0.6211
Ho: diff = 0                                     degrees of freedom =      452

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.2674         Pr(|T| > |t|) = 0.5349          Pr(T > t) = 0.7326

------------------------------------------------------------------------------------------------------
-> catfaminc8 = 0, catmath8 = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     385    47.12156    .2927101    5.743387    46.54604    47.69707
     yes |      13    48.65308    1.413799    5.097526    45.57267    51.73348
---------+--------------------------------------------------------------------
combined |     398    47.17158    .2869264    5.724165     46.6075    47.73567
---------+--------------------------------------------------------------------
    diff |           -1.531519    1.614382                -4.70535    1.642312
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -0.9487
Ho: diff = 0                                     degrees of freedom =      396

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.1717         Pr(|T| > |t|) = 0.3434          Pr(T > t) = 0.8283

Estimate the relationship between math12 and catholic separately in each of the strata (catfaminc8 and catmath8) and save the results to a new dataset (cathslopes2.dta). (Note: this output does not appear in the text and most of the output was omitted to save space.)

sort catfaminc8 catmath8
statsby diff=_b[catholic] n=e(N), by(catfaminc8 catmath8) noisily sav(cathslopes2, replace): ///
	regress math12 catholic
	
statsby: First call to regress with data as is:

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  1,  5669) =   90.48
       Model |   8043.1077     1   8043.1077           Prob > F      =  0.0000
    Residual |  503934.635  5669  88.8930385           R-squared     =  0.0157
-------------+------------------------------           Adj R-squared =  0.0155
       Total |  511977.743  5670  90.2958982           Root MSE      =  9.4283

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |    3.89486   .4094621     9.51   0.000     3.092157    4.697562
       _cons |   50.64465   .1322954   382.81   0.000      50.3853      50.904
------------------------------------------------------------------------------

statsby legend:

      command:  regress math12 catholic
         diff:  _b[catholic]
            n:  e(N)
           by:  catfaminc8 catmath8

Statsby groups

running (regress math12 catholic) on group 1

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     143
-------------+------------------------------           F(  1,   141) =    2.02
       Model |  33.0011957     1  33.0011957           Prob > F      =  0.1573
    Residual |  2302.32862   141  16.3285718           R-squared     =  0.0141
-------------+------------------------------           Adj R-squared =  0.0071
       Total |  2335.32981   142  16.4459846           Root MSE      =  4.0409

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   5.764859   4.055066     1.42   0.157    -2.251729    13.78145
       _cons |   36.80514   .3391017   108.54   0.000     36.13476    37.47552
------------------------------------------------------------------------------

running (regress math12 catholic) on group 2

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     454
-------------+------------------------------           F(  1,   452) =    0.39
       Model |  10.0803278     1  10.0803278           Prob > F      =  0.5349
    Residual |  11812.3885   452  26.1336029           R-squared     =  0.0009
-------------+------------------------------           Adj R-squared = -0.0014
       Total |  11822.4689   453  26.0981652           Root MSE      =  5.1121

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   .7094334   1.142284     0.62   0.535    -1.535412    2.954279
       _cons |   40.99247    .245672   166.86   0.000     40.50967    41.47527
------------------------------------------------------------------------------

Graph the resulting slopes. Note that the entire block of syntax should be run and once. (Not shown in the text.)

preserve
	use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/cathslopes2, clear
	list
	histogram diff, bin(6) frequency kdensity kdenopts(gaussian)
restore

     +---------------------------------------+
     | catfam~8   catmath8       diff      n |
     |---------------------------------------|
  1. |        0          0   5.764859    143 |
  2. |        0          1   .7094334    454 |
  3. |        0          2   1.531519    398 |
  4. |        0          3    .471031    441 |
  5. |        1          0   1.833437     98 |
     |---------------------------------------|
  6. |        1          1   2.639981    423 |
  7. |        1          2   2.186811    518 |
  8. |        1          3   1.999078    883 |
  9. |        2          0   .6166673     63 |
 10. |        2          1   1.481574    359 |
     |---------------------------------------|
 11. |        2          2   1.527503    505 |
 12. |        2          3   .7243947   1386 |
     +---------------------------------------+

Similar to model A from Table 12.3 on page 306, but with dummy variables representing the catfaminc8 by catmath8 interaction (with one group omitted as the reference category). (Not shown in text.)

xi:  regress math12 catholic i.catfaminc8*i.catmath8
i.catfaminc8      _Icatfaminc_0-2     (naturally coded; _Icatfaminc_0 omitted)
i.catmath8        _Icatmath8_0-3      (naturally coded; _Icatmath8_0 omitted)
i.ca~c8*i.ca~h8   _IcatXcat_#_#       (coded as above)

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F( 12,  5658) =  710.06
       Model |  307674.539    12  25639.5449           Prob > F      =  0.0000
    Residual |  204303.204  5658  36.1087317           R-squared     =  0.6010
-------------+------------------------------           Adj R-squared =  0.6001
       Total |  511977.743  5670  90.2958982           Root MSE      =  6.0091

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.328632   .2639492     5.03   0.000     .8111899    1.846073
_Icatfamin~1 |   1.115701   .7880213     1.42   0.157    -.4291226    2.660525
_Icatfamin~2 |   2.882697   .9089585     3.17   0.002      1.10079    4.664604
_Icatmath8_1 |   4.127667   .5763251     7.16   0.000     2.997848    5.257485
_Icatmath8_2 |   10.29202    .585901    17.57   0.000     9.143432    11.44061
_Icatmath8_3 |   19.21252   .5785983    33.21   0.000     18.07825    20.34679
_IcatXca~1_1 |  -.0526632   .8865024    -0.06   0.953    -1.790548    1.685221
_IcatXca~1_2 |  -.2140083   .8840602    -0.24   0.809    -1.947105    1.519089
_IcatXca~1_3 |   .3234948   .8624064     0.38   0.708    -1.367152    2.014142
_IcatXca~2_1 |  -1.084544   1.002914    -1.08   0.280    -3.050639    .8815522
_IcatXca~2_2 |  -.8031993   .9939466    -0.81   0.419    -2.751716    1.145317
_IcatXca~2_3 |  -.0975123   .9662284    -0.10   0.920     -1.99169    1.796666
       _cons |   36.83616   .5025057    73.30   0.000     35.85106    37.82127
------------------------------------------------------------------------------

The above model can also be specified using the factor variable syntax introduced in Stata 11.

regress math12 catholic catfaminc8##catmath8

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F( 12,  5658) =  710.06
       Model |  307674.539    12  25639.5449           Prob > F      =  0.0000
    Residual |  204303.204  5658  36.1087317           R-squared     =  0.6010
-------------+------------------------------           Adj R-squared =  0.6001
       Total |  511977.743  5670  90.2958982           Root MSE      =  6.0091

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.328632   .2639492     5.03   0.000     .8111899    1.846073
             |
  catfaminc8 |
          1  |   1.115701   .7880213     1.42   0.157    -.4291226    2.660525
          2  |   2.882697   .9089585     3.17   0.002      1.10079    4.664604
             |
    catmath8 |
          1  |   4.127667   .5763251     7.16   0.000     2.997848    5.257485
          2  |   10.29202    .585901    17.57   0.000     9.143432    11.44061
          3  |   19.21252   .5785983    33.21   0.000     18.07825    20.34679
             |
  catfaminc8#|
    catmath8 |
        1 1  |  -.0526632   .8865024    -0.06   0.953    -1.790548    1.685221
        1 2  |  -.2140083   .8840602    -0.24   0.809    -1.947105    1.519089
        1 3  |   .3234948   .8624064     0.38   0.708    -1.367152    2.014142
        2 1  |  -1.084544   1.002914    -1.08   0.280    -3.050639    .8815522
        2 2  |  -.8031993   .9939466    -0.81   0.419    -2.751716    1.145317
        2 3  |  -.0975123   .9662284    -0.10   0.920     -1.99169    1.796666
             |
       _cons |   36.83616   .5025057    73.30   0.000     35.85106    37.82127
------------------------------------------------------------------------------

Table 12.3 on page 306, the Stratified, Fully Crossed model. Note the noomit option of the xi command is used so that a full set of dummy variables is created (i.e. one for each category). Then the constant is suppressed (i.e. noconstant) so that all dummy variables can be included.

xi i.catfaminc8*i.catmath8, noomit
i.ca~c8*i.ca~h8   _IcatXcat_#_#       (coded as above)

regress math12 catholic _IcatXcat_0_0-_IcatXcat_2_3, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F( 13,  5658) =32141.38
       Model |  15087598.6    13  1160584.51           Prob > F      =  0.0000
    Residual |  204303.204  5658  36.1087317           R-squared     =  0.9866
-------------+------------------------------           Adj R-squared =  0.9866
       Total |  15291901.8  5671  2696.50887           Root MSE      =  6.0091

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.328632   .2639492     5.03   0.000     .8111899    1.846073
_IcatXca~0_0 |   36.83616   .5025057    73.30   0.000     35.85106    37.82127
_IcatXca~0_1 |   40.96383    .282283   145.12   0.000     40.41045    41.51721
_IcatXca~0_2 |   47.12819     .30133   156.40   0.000     46.53746    47.71891
_IcatXca~0_3 |   56.04868   .2869555   195.32   0.000     55.48614    56.61123
_IcatXca~1_0 |   37.95186     .60703    62.52   0.000     36.76185    39.14188
_IcatXca~1_1 |   42.02687    .292895   143.49   0.000     41.45268    42.60105
_IcatXca~1_2 |   48.02988   .2652007   181.11   0.000     47.50998    48.54977
_IcatXca~1_3 |   57.48788   .2041227   281.63   0.000     57.08772    57.88804
_IcatXca~2_0 |   39.71886   .7574869    52.44   0.000      38.2339    41.20382
_IcatXca~2_1 |   42.76198    .318374   134.31   0.000     42.13785    43.38612
_IcatXca~2_2 |   49.20768   .2701078   182.18   0.000     48.67817     49.7372
_IcatXca~2_3 |   58.83387   .1670966   352.09   0.000     58.50629    59.16144
------------------------------------------------------------------------------

The above model (model A from Table 12.3) can also be estimated using the factor variable syntax introduced in Stata 11. Note again that all of the groups are included and the intercept (constant) is omitted.

regress math12 catholic ibn.catfaminc8#ibn.catmath8, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F( 13,  5658) =32141.38
       Model |  15087598.6    13  1160584.51           Prob > F      =  0.0000
    Residual |  204303.204  5658  36.1087317           R-squared     =  0.9866
-------------+------------------------------           Adj R-squared =  0.9866
       Total |  15291901.8  5671  2696.50887           Root MSE      =  6.0091

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.328632   .2639492     5.03   0.000     .8111899    1.846073
             |
  catfaminc8#|
    catmath8 |
        0 0  |   36.83616   .5025057    73.30   0.000     35.85106    37.82127
        0 1  |   40.96383    .282283   145.12   0.000     40.41045    41.51721
        0 2  |   47.12819     .30133   156.40   0.000     46.53746    47.71891
        0 3  |   56.04868   .2869555   195.32   0.000     55.48614    56.61123
        1 0  |   37.95186     .60703    62.52   0.000     36.76185    39.14188
        1 1  |   42.02687    .292895   143.49   0.000     41.45268    42.60105
        1 2  |   48.02988   .2652007   181.11   0.000     47.50998    48.54977
        1 3  |   57.48788   .2041227   281.63   0.000     57.08772    57.88804
        2 0  |   39.71886   .7574869    52.44   0.000      38.2339    41.20382
        2 1  |   42.76198    .318374   134.31   0.000     42.13785    43.38612
        2 2  |   49.20768   .2701078   182.18   0.000     48.67817     49.7372
        2 3  |   58.83387   .1670966   352.09   0.000     58.50629    59.16144
------------------------------------------------------------------------------

Table 12.3 on page 306, the Linear Main Effects, Two-way Interaction model.

logit catholic inc8 math8 mathfam

Iteration 0:   log likelihood = -1897.6568  
Iteration 1:   log likelihood = -1840.7214  
Iteration 2:   log likelihood = -1837.6029  
Iteration 3:   log likelihood = -1837.5922  
Iteration 4:   log likelihood = -1837.5922  

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(3)      =     120.13
                                                  Prob > chi2     =     0.0000
Log likelihood = -1837.5922                       Pseudo R2       =     0.0317

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0618026   .0140542     4.40   0.000     .0342569    .0893482
       math8 |   .0429594    .011135     3.86   0.000     .0211352    .0647836
     mathfam |   -.000734   .0002615    -2.81   0.005    -.0012466   -.0002214
       _cons |  -5.208846   .5863848    -8.88   0.000    -6.358139   -4.059553
------------------------------------------------------------------------------

* Recode faminc8 so that the values are actual mid-values of income in $1000:
recode faminc8 (1=0) (2=.5) (3=2) (4=4) (5=6.25) (6=8.75) ///
	(7=12.5) (8=17.5) (9=22.5) (10=30) (11=42.5) (12=62.5), gen(inc8)
(5586 differences between faminc8 and inc8)

gen mathfam = math8*inc8

regress math12 inc8 math8 mathfam catholic 

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  4,  5666) = 3259.30
       Model |  356877.886     4  89219.4715           Prob > F      =  0.0000
    Residual |  155099.857  5666  27.3737834           R-squared     =  0.6971
-------------+------------------------------           Adj R-squared =  0.6968
       Total |  511977.743  5670  90.2958982           Root MSE      =   5.232

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .1638722   .0218124     7.51   0.000     .1211115    .2066329
       math8 |   .8721913   .0160066    54.49   0.000     .8408123    .9035703
     mathfam |   -.002435   .0004171    -5.84   0.000    -.0032527   -.0016173 
    catholic |   1.658869   .2295556     7.23   0.000     1.208852    2.108886
       _cons |   4.827092   .8004556     6.03   0.000     3.257892    6.396291
------------------------------------------------------------------------------

The above model (model B from Table 12.3) can also be estimated using the factor variable syntax introduced in Stata 11. Note that it is still necessary to recode faminc8 into inc8, but it is not necessary to create the interaction term.

regress math12 c.inc8##c.math8 catholic 

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  4,  5666) = 3259.30
       Model |  356877.886     4  89219.4715           Prob > F      =  0.0000
    Residual |  155099.857  5666  27.3737834           R-squared     =  0.6971
-------------+------------------------------           Adj R-squared =  0.6968
       Total |  511977.743  5670  90.2958982           Root MSE      =   5.232

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .1638722   .0218124     7.51   0.000     .1211115    .2066329
       math8 |   .8721913   .0160066    54.49   0.000     .8408123    .9035703
             |
      c.inc8#|
     c.math8 |   -.002435   .0004171    -5.84   0.000    -.0032527   -.0016173
             |
    catholic |   1.658869   .2295556     7.23   0.000     1.208852    2.108886
       _cons |   4.827092   .8004556     6.03   0.000     3.257892    6.396291
------------------------------------------------------------------------------

Table 12.4, Model A: Initial specification, with linear main effect of inc8, on page 312.

logit catholic inc8 math8 mathfam

Iteration 0:   log likelihood = -1897.6568  
Iteration 1:   log likelihood = -1840.7214  
Iteration 2:   log likelihood = -1837.6029  
Iteration 3:   log likelihood = -1837.5922  
Iteration 4:   log likelihood = -1837.5922  

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(3)      =     120.13
                                                  Prob > chi2     =     0.0000
Log likelihood = -1837.5922                       Pseudo R2       =     0.0317

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0618026   .0140542     4.40   0.000     .0342569    .0893482
       math8 |   .0429594    .011135     3.86   0.000     .0211352    .0647836
     mathfam |   -.000734   .0002615    -2.81   0.005    -.0012466   -.0002214
       _cons |  -5.208846   .5863848    -8.88   0.000    -6.358139   -4.059553
------------------------------------------------------------------------------

Table 12.4, Model B: Final specification, with quadratic main effect of inc8, on page 312.

gen inc8sq = inc8*inc8

logit catholic inc8 math8 mathfam inc8sq

Iteration 0:   log likelihood = -1897.6568  
Iteration 1:   log likelihood = -1838.7904  
Iteration 2:   log likelihood = -1833.5513  
Iteration 3:   log likelihood = -1833.5413  
Iteration 4:   log likelihood = -1833.5413  

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(4)      =     128.23
                                                  Prob > chi2     =     0.0000
Log likelihood = -1833.5413                       Pseudo R2       =     0.0338

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0869049    .017354     5.01   0.000     .0528918     .120918
       math8 |   .0355965   .0119779     2.97   0.003     .0121202    .0590728
     mathfam |  -.0005647   .0002821    -2.00   0.045    -.0011175   -.0000119
      inc8sq |  -.0004382   .0001569    -2.79   0.005    -.0007458   -.0001306
       _cons |  -5.362148   .6190447    -8.66   0.000    -6.575453   -4.148842
------------------------------------------------------------------------------

predict p
(option pr assumed; Pr(catholic))

Model B from Table 12.4 on page 312 can also be estimated using the factor variable syntax introduced in Stata 11. Note that it is not necessary to create the squared term before running this model.

logit catholic inc8 math8 mathfam c.inc8#c.inc8

Iteration 0:   log likelihood = -1897.6568  
Iteration 1:   log likelihood = -1838.7904  
Iteration 2:   log likelihood = -1833.5513  
Iteration 3:   log likelihood = -1833.5413  
Iteration 4:   log likelihood = -1833.5413  

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(4)      =     128.23
                                                  Prob > chi2     =     0.0000
Log likelihood = -1833.5413                       Pseudo R2       =     0.0338

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0869049    .017354     5.01   0.000     .0528918     .120918
       math8 |   .0355965   .0119779     2.97   0.003     .0121202    .0590728
     mathfam |  -.0005647   .0002821    -2.00   0.045    -.0011175   -.0000119
             |
      c.inc8#|
      c.inc8 |  -.0004382   .0001569    -2.79   0.005    -.0007458   -.0001306
             |
       _cons |  -5.362148   .6190447    -8.66   0.000    -6.575453   -4.148842
------------------------------------------------------------------------------

predict p
(option pr assumed; Pr(catholic))

Detailed summary statistics for the propensity score variable p. (Not shown in text.)

sum p, detail

                        Pr(catholic)
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0208345       .0164257
 5%     .0320812        .016906
10%     .0408222       .0170297       Obs                5671
25%     .0672965        .017208       Sum of Wgt.        5671

50%     .1056115                      Mean           .1043908
                        Largest       Std. Dev.      .0440799
75%      .142168       .1729462
90%     .1643515       .1729462       Variance        .001943
95%     .1647264       .1729462       Skewness      -.1636008
99%     .1652305       .1729462       Kurtosis        1.83253

Figure 12.2, Panel A: Full Sample, shown on page 315.

histogram p, kdensity kdenopts(gaussian) xlabel(0(.1).2) ///
	ytitle(Frequency) xtitle(Estimated Propensity Scores)

Summary statistics for the propesnity score variable p, by catholic.

by catholic, sort: sum p, detail

-------------------------------------------------------------------------------------------
-> catholic = no

                        Pr(catholic)
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0204874       .0164257
 5%     .0304734        .016906
10%     .0398502       .0170297       Obs                5079
25%     .0643826        .017208       Sum of Wgt.        5079

50%     .1018312                      Mean           .1022535
                        Largest       Std. Dev.      .0442736
75%     .1395716       .1729462
90%     .1642913       .1729462       Variance       .0019602
95%     .1647052       .1729462       Skewness      -.1037761
99%     .1652047       .1729462       Kurtosis       1.814756

-------------------------------------------------------------------------------------------
-> catholic = yes

                        Pr(catholic)
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0311486       .0221945
 5%     .0498571       .0255137
10%      .066539       .0260665       Obs                 592
25%     .0935338       .0266655       Sum of Wgt.         592

50%     .1307598                      Mean            .122727
                        Largest       Std. Dev.      .0377261
75%     .1636715       .1654167
90%     .1645418       .1659938       Variance       .0014233
95%     .1648288       .1668626       Skewness      -.6233737
99%     .1652885       .1729462       Kurtosis       2.378922

Figure 12.2, Panel B: By catholic, shown on page 315.

histogram p, kdensity kdenopts(gaussian) by(catholic, cols(1) legend(off)) ///
	xlabel(0(.1).2) ytitle(Frequency) xtitle(Estimated Propensity Scores)

Stratifying on propensity scores, discussed on pages 316-317. This uses the same set of variables as Model A from Table 12.4. Note that pscore is a user-written command, and must be downloaded prior to use, for more information see our FAQ page How do I use search to search for programs and additional help?. (Not shown in text.)

pscore catholic inc8 math8 mathfam, logit pscore(p) blockid(b) numblo(5)


**************************************************** 
Algorithm to estimate the propensity score 
**************************************************** 


The treatment is catholic

   attended |
   catholic |
        hs? |      Freq.     Percent        Cum.
------------+-----------------------------------
         no |      5,079       89.56       89.56
        yes |        592       10.44      100.00
------------+-----------------------------------
      Total |      5,671      100.00



Estimation of the propensity score 

Iteration 0:   log likelihood = -1897.6568
Iteration 1:   log likelihood = -1840.7214
Iteration 2:   log likelihood = -1837.6047
Iteration 3:   log likelihood = -1837.5922
Iteration 4:   log likelihood = -1837.5922

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(3)      =     120.13
                                                  Prob > chi2     =     0.0000
Log likelihood = -1837.5922                       Pseudo R2       =     0.0317

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0618026   .0140542     4.40   0.000     .0342569    .0893482
       math8 |   .0429594    .011135     3.86   0.000     .0211352    .0647836
     mathfam |   -.000734   .0002615    -2.81   0.005    -.0012466   -.0002214
       _cons |  -5.208846   .5863848    -8.88   0.000    -6.358139   -4.059553
------------------------------------------------------------------------------



Description of the estimated propensity score 

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0300386       .0241574
 5%     .0400463       .0250049
10%     .0488683       .0252239       Obs                5671
25%     .0700201         .02554       Sum of Wgt.        5671

50%     .1023014                      Mean           .1043908
                        Largest       Std. Dev.      .0442227
75%     .1299765       .1898257
90%     .1795338       .1898437       Variance       .0019556
95%     .1835134       .1899649       Skewness       .3693322
99%      .187957       .1900232       Kurtosis       2.215181



****************************************************** 
Step 1: Identification of the optimal number of blocks 
Use option detail if you want more detailed output 
****************************************************** 


The final number of blocks is 4

This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks



********************************************************** 
Step 2: Test of balancing property of the propensity score 
Use option detail if you want more detailed output 
********************************************************** 

Variable inc8 is not balanced in block 4

Variable mathfam is not balanced in block 4

The balancing property is not satisfied 

Try a different specification of the propensity score 

  Inferior |
  of block | attended catholic hs?
of pscore  |        no        yes |     Total
-----------+----------------------+----------
         0 |       588         18 |       606 
       .05 |     1,002         56 |     1,058 
      .075 |     1,010        113 |     1,123 
        .1 |     2,479        405 |     2,884 
-----------+----------------------+----------
     Total |     5,079        592 |     5,671 



******************************************* 
End of the algorithm to estimate the pscore 
*******************************************

Estimate the propensity score blocks shown in Table 12.5 on page 318. (Output not shown in text.)

* drop propensity score variables if they already exist
drop p b

pscore catholic inc8 inc8sq math8 mathfam, logit pscore(p) blockid(b) numblo(5)


**************************************************** 
Algorithm to estimate the propensity score 
**************************************************** 


The treatment is catholic

   attended |
   catholic |
        hs? |      Freq.     Percent        Cum.
------------+-----------------------------------
         no |      5,079       89.56       89.56
        yes |        592       10.44      100.00
------------+-----------------------------------
      Total |      5,671      100.00



Estimation of the propensity score 

Iteration 0:   log likelihood = -1897.6568
Iteration 1:   log likelihood = -1838.7904
Iteration 2:   log likelihood = -1833.6223
Iteration 3:   log likelihood = -1833.5413
Iteration 4:   log likelihood = -1833.5413

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(4)      =     128.23
                                                  Prob > chi2     =     0.0000
Log likelihood = -1833.5413                       Pseudo R2       =     0.0338

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0869049    .017354     5.01   0.000     .0528918     .120918
      inc8sq |  -.0004382   .0001569    -2.79   0.005    -.0007458   -.0001306
       math8 |   .0355965   .0119779     2.97   0.003     .0121202    .0590728
     mathfam |  -.0005647   .0002821    -2.00   0.045    -.0011175   -.0000119
       _cons |  -5.362148   .6190447    -8.66   0.000    -6.575453   -4.148842
------------------------------------------------------------------------------



Description of the estimated propensity score 

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0208345       .0164257
 5%     .0320812        .016906
10%     .0408223       .0170297       Obs                5671
25%     .0672965        .017208       Sum of Wgt.        5671

50%     .1056115                      Mean           .1043908
                        Largest       Std. Dev.      .0440799
75%      .142168       .1729462
90%     .1643515       .1729462       Variance        .001943
95%     .1647264       .1729462       Skewness      -.1636008
99%     .1652305       .1729462       Kurtosis        1.83253



****************************************************** 
Step 1: Identification of the optimal number of blocks 
Use option detail if you want more detailed output 
****************************************************** 


The final number of blocks is 6

This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks



********************************************************** 
Step 2: Test of balancing property of the propensity score 
Use option detail if you want more detailed output 
********************************************************** 


The balancing property is satisfied 


This table shows the inferior bound, the number of treated
and the number of controls for each block 

  Inferior |
  of block | attended catholic hs?
of pscore  |        no        yes |     Total
-----------+----------------------+----------
         0 |       810         31 |       841 
       .05 |       741         45 |       786 
      .075 |       928        100 |     1,028 
        .1 |       786         87 |       873 
      .125 |       810        145 |       955 
       .15 |     1,004        184 |     1,188 
-----------+----------------------+----------
     Total |     5,079        592 |     5,671 



******************************************* 
End of the algorithm to estimate the pscore 
*******************************************

Variable means by block from Table 12.5 on page 318. Note that for Block 3, the average mathematics achievement for catholic students is listed as 49.63 in the book, but is 51.56 in the table below. Based on communication with the authors, this appears to be a typographic error in the book.

table b catholic, contents(freq mean p mean inc8 mean math8 mean math12)

--------------------------------
Number of |attended catholic hs?
block     |        no        yes
----------+---------------------
        1 |       810         31
          | .03562671   .0397066
          |  8.466666   9.814516
          |  43.16351   44.67839
          |  42.74021   45.34968
          | 
        2 |       741         45
          | .06206016  .06352629
          |  18.13968   17.52778
          |  47.44714   49.45711
          |  47.14545   50.21756
          | 
        3 |       928        100
          |  .0875975  .08860363
          |  26.64197     26.565
          |  48.80288    49.6273
          |  48.79251      51.56
          | 
        4 |       786         87
          |  .1138969  .11401803
          |  33.34605   33.36207
          |  52.61875    52.9077
          |  52.02316   54.26402
          | 
        5 |       810        145
          | .13605543  .13692428
          |  40.72839   41.46552
          |  55.15296   54.78959
          |  54.71558   56.54048
          | 
        6 |     1,004        184
          | .16283171  .16266777
          |  57.33815   58.36956
          |  58.55379   57.85957
          |  56.95275    57.3175
--------------------------------

Tests for differences in academic achievement by catholic, in each block, shown in Table 12.5 on page 318. Note that the sign of the differences are reversed, but the magnitude is the same, and that the error in the mean for block 3 discussed above persists.

by b, sort: ttest math12, by(catholic) 

-------------------------------------------------------------------------------------------
-> b = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     810    42.74021    .2449484    6.971353     42.2594    43.22102
     yes |      31    45.34968    1.310109    7.294381    42.67408    48.02528
---------+--------------------------------------------------------------------
combined |     841     42.8364    .2412525    6.996321    42.36287    43.30993
---------+--------------------------------------------------------------------
    diff |           -2.609468    1.277988               -5.117896   -.1010391
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.0419
Ho: diff = 0                                     degrees of freedom =      839

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0207         Pr(|T| > |t|) = 0.0415          Pr(T > t) = 0.9793

-------------------------------------------------------------------------------------------
-> b = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     741    47.14545    .2882466    7.846452    46.57957    47.71133
     yes |      45    50.21756    1.136082    7.621067    47.92793    52.50718
---------+--------------------------------------------------------------------
combined |     786    47.32134    .2804101     7.86149    46.77089    47.87178
---------+--------------------------------------------------------------------
    diff |           -3.072103    1.202757                -5.43311   -.7110971
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.5542
Ho: diff = 0                                     degrees of freedom =      784

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0054         Pr(|T| > |t|) = 0.0108          Pr(T > t) = 0.9946

-------------------------------------------------------------------------------------------
-> b = 3

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     928    48.79251    .2754558    8.391235    48.25192     49.3331
     yes |     100       51.56    .8071014    8.071014    49.95854    53.16146
---------+--------------------------------------------------------------------
combined |    1028    49.06172    .2618947    8.396983    48.54781    49.57563
---------+--------------------------------------------------------------------
    diff |           -2.767489    .8799826                -4.49426   -1.040718
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -3.1449
Ho: diff = 0                                     degrees of freedom =     1026

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0009         Pr(|T| > |t|) = 0.0017          Pr(T > t) = 0.9991

-------------------------------------------------------------------------------------------
-> b = 4

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     786    52.02316    .3402795    9.539971    51.35519    52.69112
     yes |      87    54.26402    .9397039    8.764975    52.39595    56.13209
---------+--------------------------------------------------------------------
combined |     873    52.24647    .3210069    9.484653    51.61644    52.87651
---------+--------------------------------------------------------------------
    diff |           -2.240868    1.069585               -4.340133   -.1416024
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.0951
Ho: diff = 0                                     degrees of freedom =      871

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0182         Pr(|T| > |t|) = 0.0365          Pr(T > t) = 0.9818

-------------------------------------------------------------------------------------------
-> b = 5

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     810    54.71558    .2588964    7.368319    54.20739    55.22377
     yes |     145    56.54048    .5606502    6.751122    55.43232    57.64865
---------+--------------------------------------------------------------------
combined |     955    54.99266    .2363535     7.30405    54.52883    55.45649
---------+--------------------------------------------------------------------
    diff |           -1.824902    .6563147               -3.112891   -.5369135
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.7805
Ho: diff = 0                                     degrees of freedom =      953

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0028         Pr(|T| > |t|) = 0.0055          Pr(T > t) = 0.9972

-------------------------------------------------------------------------------------------
-> b = 6

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1004    56.95275    .2789432    8.838582    56.40537    57.50013
     yes |     184     57.3175    .6020763    8.166961     56.1296     58.5054
---------+--------------------------------------------------------------------
combined |    1188    57.00924    .2534465    8.735635    56.51199     57.5065
---------+--------------------------------------------------------------------
    diff |           -.3647511    .7007456                -1.73959    1.010088
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -0.5205
Ho: diff = 0                                     degrees of freedom =     1186

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3014         Pr(|T| > |t|) = 0.6028          Pr(T > t) = 0.6986

Weighted average ATT shown in Table 12.5 on page 318. Note that atts is part of the same user-written package as pscore and that the set seed command was used so that the results of the bootstrap can be replicated.

set seed 53156
atts math12 catholic, pscore(p) blockid(b) bootstrap

ATT estimation with the Stratification method
Analytical standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT   Std. Err.           t
---------------------------------------------------------

      592        5079       1.727       0.347       4.975

---------------------------------------------------------





Bootstrapping of standard errors 

command:      atts math12 catholic , pscore(p) blockid(b)
statistic:    atts       = r(atts)

Bootstrap statistics                              Number of obs    =      5671
                                                  Replications     =        50

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
        atts |    50   1.72731  -.044933  .3138169   1.096672   2.357949   (N)
             |                                       1.135532   2.304237   (P)
             |                                       1.273374   2.393047  (BC)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected



ATT estimation with the Stratification method
Bootstrapped standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT   Std. Err.           t
---------------------------------------------------------

      592        5079       1.727       0.314       5.504

---------------------------------------------------------

The following few examples demonstrate difference methods of analyzing the same data, treating the propensity scores as an optimal composite covariate.

Method A: Controlling for block by estimating the relationship between math12 and catholic separately in each block.

sort b

statsby _b[catholic] e(N), by(b) noisily sav(CathSlopes3,replace): regress math12 catholic

statsby: First call to regress with data as is:

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  1,  5669) =   90.48
       Model |   8043.1077     1   8043.1077           Prob > F      =  0.0000
    Residual |  503934.635  5669  88.8930385           R-squared     =  0.0157
-------------+------------------------------           Adj R-squared =  0.0155
       Total |  511977.743  5670  90.2958982           Root MSE      =  9.4283

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |    3.89486   .4094621     9.51   0.000     3.092157    4.697562
       _cons |   50.64465   .1322954   382.81   0.000      50.3853      50.904
------------------------------------------------------------------------------

statsby legend:

      command:  regress math12 catholic
      _stat_1:  _b[catholic]
      _stat_2:  e(N)
           by:  b

Statsby groups

running (regress math12 catholic) on group 1

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     841
-------------+------------------------------           F(  1,   839) =    4.17
       Model |  203.308037     1  203.308037           Prob > F      =  0.0415
    Residual |  40913.4431   839  48.7645329           R-squared     =  0.0049
-------------+------------------------------           Adj R-squared =  0.0038
       Total |  41116.7512   840  48.9485133           Root MSE      =  6.9832

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.609468   1.277988     2.04   0.041     .1010391    5.117896
       _cons |   42.74021   .2453633   174.19   0.000     42.25861    43.22181
------------------------------------------------------------------------------

running (regress math12 catholic) on group 2

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     786
-------------+------------------------------           F(  1,   784) =    6.52
       Model |  400.386878     1  400.386878           Prob > F      =  0.0108
    Residual |  48114.9879   784  61.3711581           R-squared     =  0.0083
-------------+------------------------------           Adj R-squared =  0.0070
       Total |  48515.3748   785  61.8030252           Root MSE      =   7.834

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   3.072103   1.202757     2.55   0.011     .7110971     5.43311
       _cons |   47.14545   .2877882   163.82   0.000     46.58053    47.71038
------------------------------------------------------------------------------

running (regress math12 catholic) on group 3

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    1028
-------------+------------------------------           F(  1,  1026) =    9.89
       Model |  691.395779     1  691.395779           Prob > F      =  0.0017
    Residual |  71721.6733  1026   69.904165           R-squared     =  0.0095
-------------+------------------------------           Adj R-squared =  0.0086
       Total |   72413.069  1027  70.5093175           Root MSE      =  8.3609

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.767489   .8799826     3.14   0.002     1.040718     4.49426
       _cons |   48.79251    .274459   177.78   0.000     48.25395    49.33108
------------------------------------------------------------------------------

running (regress math12 catholic) on group 4

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     873
-------------+------------------------------           F(  1,   871) =    4.39
       Model |  393.332574     1  393.332574           Prob > F      =  0.0365
    Residual |  78050.6082   871  89.6103423           R-squared     =  0.0050
-------------+------------------------------           Adj R-squared =  0.0039
       Total |  78443.9407   872  89.9586476           Root MSE      =  9.4663

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.240868   1.069585     2.10   0.036     .1416024    4.340133
       _cons |   52.02316   .3376508   154.07   0.000     51.36045    52.68586
------------------------------------------------------------------------------

running (regress math12 catholic) on group 5

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     955
-------------+------------------------------           F(  1,   953) =    7.73
       Model |   409.57076     1   409.57076           Prob > F      =  0.0055
    Residual |  50485.5136   953  52.9753553           R-squared     =  0.0080
-------------+------------------------------           Adj R-squared =  0.0070
       Total |  50895.0844   954  53.3491451           Root MSE      =  7.2784

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.824902   .6563147     2.78   0.006     .5369135    3.112891
       _cons |   54.71558   .2557375   213.95   0.000     54.21371    55.21745
------------------------------------------------------------------------------

running (regress math12 catholic) on group 6

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    1188
-------------+------------------------------           F(  1,  1186) =    0.27
       Model |  20.6884702     1  20.6884702           Prob > F      =  0.6028
    Residual |  90560.8493  1186  76.3582203           R-squared     =  0.0002
-------------+------------------------------           Adj R-squared = -0.0006
       Total |  90581.5377  1187  76.3113208           Root MSE      =  8.7383

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   .3647511   .7007456     0.52   0.603    -1.010088     1.73959
       _cons |   56.95275   .2757789   206.52   0.000     56.41168    57.49382
------------------------------------------------------------------------------

Distribution of coefficients for catholic (predicting math12) across the propensity score blocks. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. (Not shown in text.)

preserve
use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/CathSlopes3, clear
list
histogram _stat_1, bin(3) frequency kdensity kdenopts(gaussian)
restore

     +------------------------+
     | b    _stat_1   _stat_2 |
     |------------------------|
  1. | 1   2.609468       841 |
  2. | 2   3.072104       786 |
  3. | 3   2.767489      1028 |
  4. | 4   2.240868       873 |
  5. | 5   1.824902       955 |
     |------------------------|
  6. | 6   .3647511      1188 |
     +------------------------+

Method B: Estimate the relationship between math12 and catholic in all blocks at the same time, using fixed effects the blocks. Note that this model includes the intercept and dummy variables for blocks 2 to 6. (Not shown in text.)

xi:  regress math12 catholic i.b
i.b               _Ib_1-6             (naturally coded; _Ib_1 omitted)

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  6,  5664) =  326.67
       Model |  131623.108     6  21937.1846           Prob > F      =  0.0000
    Residual |  380354.635  5664  67.1530076           R-squared     =  0.2571
-------------+------------------------------           Adj R-squared =  0.2563
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.1947

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.761271   .3595793     4.90   0.000     1.056358    2.466184
       _Ib_2 |   4.449025   .4066192    10.94   0.000     3.651895    5.246154
       _Ib_3 |   6.118917   .3816345    16.03   0.000     5.370767    6.867066
       _Ib_4 |   9.299475   .3965866    23.45   0.000     8.522013    10.07694
       _Ib_5 |   11.95377   .3897119    30.67   0.000     11.18978    12.71775
       _Ib_6 |   13.96498   .3717204    37.57   0.000     13.23626    14.69369
       _cons |   42.77148   .2828863   151.20   0.000     42.21691    43.32604
------------------------------------------------------------------------------

This model can also be estimated using the factor variable syntax introduced in Stata 11. (Not shown in text.)

regress math12 catholic i.b

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  6,  5664) =  326.67
       Model |  131623.108     6  21937.1846           Prob > F      =  0.0000
    Residual |  380354.635  5664  67.1530076           R-squared     =  0.2571
-------------+------------------------------           Adj R-squared =  0.2563
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.1947

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.761271   .3595793     4.90   0.000     1.056358    2.466184
             |
           b |
          2  |   4.449025   .4066192    10.94   0.000     3.651895    5.246154
          3  |   6.118917   .3816345    16.03   0.000     5.370767    6.867066
          4  |   9.299475   .3965866    23.45   0.000     8.522013    10.07694
          5  |   11.95377   .3897119    30.67   0.000     11.18978    12.71775
          6  |   13.96498   .3717204    37.57   0.000     13.23626    14.69369
             |
       _cons |   42.77148   .2828863   151.20   0.000     42.21691    43.32604
------------------------------------------------------------------------------

An equivalent model with no intercept and a fixed effect for each block. (Not shown in text.)

xi i.b, noomit
regress math12 catholic _Ib_1-_Ib_6, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  7,  5664) =31721.90
       Model |  14911547.1     7  2130221.02           Prob > F      =  0.0000
    Residual |  380354.635  5664  67.1530076           R-squared     =  0.9751
-------------+------------------------------           Adj R-squared =  0.9751
       Total |  15291901.8  5671  2696.50887           Root MSE      =  8.1947

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.761271   .3595793     4.90   0.000     1.056358    2.466184
       _Ib_1 |   42.77148   .2828863   151.20   0.000     42.21691    43.32604
       _Ib_2 |    47.2205   .2930191   161.15   0.000     46.64607    47.79493
       _Ib_3 |   48.89039   .2579679   189.52   0.000     48.38468    49.39611
       _Ib_4 |   52.07095   .2796537   186.20   0.000     51.52272    52.61918
       _Ib_5 |   54.72524    .270736   202.14   0.000      54.1945    55.25599
       _Ib_6 |   56.73645   .2441879   232.35   0.000     56.25775    57.21515
------------------------------------------------------------------------------

This model can also be estimated using the factor variable syntax introduced in Stata 11. Note that the dummy variables do not need to be created using xi.

regress math12 catholic ibn.b, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  7,  5664) =31721.90
       Model |  14911547.1     7  2130221.02           Prob > F      =  0.0000
    Residual |  380354.635  5664  67.1530076           R-squared     =  0.9751
-------------+------------------------------           Adj R-squared =  0.9751
       Total |  15291901.8  5671  2696.50887           Root MSE      =  8.1947

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.761271   .3595793     4.90   0.000     1.056358    2.466184
             |
           b |
          1  |   42.77148   .2828863   151.20   0.000     42.21691    43.32604
          2  |    47.2205   .2930191   161.15   0.000     46.64607    47.79493
          3  |   48.89039   .2579679   189.52   0.000     48.38468    49.39611
          4  |   52.07095   .2796537   186.20   0.000     51.52272    52.61918
          5  |   54.72524    .270736   202.14   0.000      54.1945    55.25599
          6  |   56.73645   .2441879   232.35   0.000     56.25775    57.21515
------------------------------------------------------------------------------

Method C: Controlling for propensities (as a linear effect). (Not shown in text.)

regress math12 catholic p

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  2,  5668) =  996.64
       Model |  133204.471     2  66602.2355           Prob > F      =  0.0000
    Residual |  378773.272  5668  66.8266182           R-squared     =  0.2602
-------------+------------------------------           Adj R-squared =  0.2599
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.1748

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.690306   .3586574     4.71   0.000        .9872    2.393411
           p |   107.6782   2.488099    43.28   0.000     102.8006    112.5559
       _cons |   39.63417   .2790795   142.02   0.000     39.08707    40.18127
------------------------------------------------------------------------------

Create propensity score blocks for Table 12.6 on page 320. (Output not shown in text.)

* drop p and b from previous examples
drop p b

pscore catholic inc8 inc8sq math8 mathfam fhowfar mhowfar fight8 nohw8 ///
	disrupt8 riskdrop8, logit pscore(p) blockid(b) numblo(10)
	
**************************************************** 
Algorithm to estimate the propensity score 
**************************************************** 


The treatment is catholic

   attended |
   catholic |
        hs? |      Freq.     Percent        Cum.
------------+-----------------------------------
         no |      5,079       89.56       89.56
        yes |        592       10.44      100.00
------------+-----------------------------------
      Total |      5,671      100.00



Estimation of the propensity score 

Iteration 0:   log likelihood = -1897.6568
Iteration 1:   log likelihood = -1814.3485
Iteration 2:   log likelihood = -1804.4342
Iteration 3:   log likelihood = -1804.1259
Iteration 4:   log likelihood = -1804.1254

Logistic regression                               Number of obs   =       5671
                                                  LR chi2(10)     =     187.06
                                                  Prob > chi2     =     0.0000
Log likelihood = -1804.1254                       Pseudo R2       =     0.0493

------------------------------------------------------------------------------
    catholic |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        inc8 |   .0544244   .0190915     2.85   0.004     .0170058     .091843
      inc8sq |  -.0001894   .0001732    -1.09   0.274     -.000529    .0001501
       math8 |   .0215572   .0123655     1.74   0.081    -.0026787    .0457932
     mathfam |  -.0004537   .0002873    -1.58   0.114    -.0010169    .0001095
     fhowfar |   .1963326   .0866025     2.27   0.023     .0265949    .3660703
     mhowfar |   .0256765    .086921     0.30   0.768    -.1446855    .1960384
      fight8 |  -.4742975   .3246254    -1.46   0.144    -1.110552    .1619566
       nohw8 |  -.6880268   .1760058    -3.91   0.000    -1.032992   -.3430618
    disrupt8 |   .6927506   .3858711     1.80   0.073    -.0635429    1.449044
   riskdrop8 |  -.3033031   .0843134    -3.60   0.000    -.4685543   -.1380518
       _cons |  -4.981792    .703233    -7.08   0.000    -6.360104   -3.603481
------------------------------------------------------------------------------


Description of the estimated propensity score 

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0108842       .0025956
 5%     .0204415       .0032377
10%     .0313083       .0036331       Obs                5671
25%     .0595311       .0047923       Sum of Wgt.        5671

50%     .1072919                      Mean           .1043908
                        Largest       Std. Dev.      .0530622
75%     .1453336       .2551539
90%     .1744188       .2571622       Variance       .0028156
95%     .1858031       .2619453       Skewness       .0612084
99%     .2138622        .262511       Kurtosis       2.084156



****************************************************** 
Step 1: Identification of the optimal number of blocks 
Use option detail if you want more detailed output 
****************************************************** 


The final number of blocks is 5

This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks



********************************************************** 
Step 2: Test of balancing property of the propensity score 
Use option detail if you want more detailed output 
********************************************************** 


The balancing property is satisfied 


This table shows the inferior bound, the number of treated
and the number of controls for each block 

  Inferior |
  of block | attended catholic hs?
of pscore  |        no        yes |     Total
-----------+----------------------+----------
         0 |     1,089         34 |     1,123 
       .05 |     1,431        110 |     1,541 
        .1 |     1,599        253 |     1,852 
       .15 |       829        160 |       989 
        .2 |       131         35 |       166 
-----------+----------------------+----------
     Total |     5,079        592 |     5,671 



******************************************* 
End of the algorithm to estimate the pscore 
*******************************************

Distribution of estimated propensity scores based on the model for Table 12.6. (Not shown in text.)

sum p, detail

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0108842       .0025956
 5%     .0204415       .0032377
10%     .0313083       .0036331       Obs                5671
25%     .0595311       .0047923       Sum of Wgt.        5671

50%     .1072919                      Mean           .1043908
                        Largest       Std. Dev.      .0530622
75%     .1453336       .2551539
90%     .1744188       .2571622       Variance       .0028156
95%     .1858031       .2619453       Skewness       .0612084
99%     .2138622        .262511       Kurtosis       2.084156


histogram p, kdensity kdenopts(gaussian) xlabel(0(.1).3) ///
	ytitle(Frequency) xtitle(Estimated Propensity Scores)

Distribution of propensity scores by catholic.

by catholic, sort: sum p, detail

-------------------------------------------------------------------------------------------
-> catholic = no

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0104881       .0025956
 5%     .0195553       .0032377
10%     .0297857       .0036331       Obs                5079
25%     .0565767       .0047923       Sum of Wgt.        5079

50%     .1008954                      Mean           .1012761
                        Largest       Std. Dev.      .0529831
75%     .1440137       .2551539
90%      .173256       .2571622       Variance       .0028072
95%     .1824525       .2619453       Skewness       .1216262
99%     .2133106        .262511       Kurtosis       2.085038

-------------------------------------------------------------------------------------------
-> catholic = yes

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0229674       .0099903
 5%     .0470903        .011172
10%     .0699177        .018644       Obs                 592
25%     .1003173       .0196516       Sum of Wgt.         592

50%     .1355502                      Mean           .1311124
                        Largest       Std. Dev.      .0457582
75%      .169457       .2266057
90%     .1842547        .228063       Variance       .0020938
95%     .2031689       .2391276       Skewness      -.3031523
99%     .2224912       .2462637       Kurtosis       2.624687


histogram p, kdensity kdenopts(gaussian) by(catholic, cols(1) legend(off)) ///
	xlabel(0(.1).3) ytitle(Frequency) xtitle(Estimated Propensity Scores)

Descriptive statistics for Table 12.6 on page 320.

table b catholic, contents(freq mean p mean math12)

--------------------------------
Number of |attended catholic hs?
block     |        no        yes
----------+---------------------
        1 |     1,089         34
          | .03032015  .03453844
          |  43.66365   46.01353
          | 
        2 |     1,431        110
          | .07522784  .07848568
          |  48.85303   51.00237
          | 
        3 |     1,599        253
          |  .1270797  .12911088
          |  53.62299   55.38316
          | 
        4 |       829        160
          | .17195554  .17329525
          |  56.86899   57.34556
          | 
        5 |       131         35
          | .21343681  .21195736
          |  52.50557   55.01257
--------------------------------

Tests for differences in academic achievement by catholic, in each block, shown in Table 12.6 on page 318. Note that the sign of the differences are reversed, but the magnitude is the same.

by b, sort: ttest math12, by(catholic) 

-------------------------------------------------------------------------------------------
-> b = 1

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1089    43.66365    .2303222    7.600632    43.21172    44.11557
     yes |      34    46.01353    1.288496    7.513157    43.39206    48.63499
---------+--------------------------------------------------------------------
combined |    1123    43.73479    .2269499     7.60536     43.2895    44.18008
---------+--------------------------------------------------------------------
    diff |           -2.349884    1.323244               -4.946197    .2464296
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -1.7759
Ho: diff = 0                                     degrees of freedom =     1121

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0380         Pr(|T| > |t|) = 0.0760          Pr(T > t) = 0.9620

-------------------------------------------------------------------------------------------
-> b = 2

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1431    48.85303    .2310661    8.740896    48.39977     49.3063
     yes |     110    51.00236    .7979284    8.368744     49.4209    52.58383
---------+--------------------------------------------------------------------
combined |    1541    49.00646    .2223837    8.729799    48.57025    49.44266
---------+--------------------------------------------------------------------
    diff |           -2.149331    .8622945               -3.840727   -.4579344
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.4926
Ho: diff = 0                                     degrees of freedom =     1539

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0064         Pr(|T| > |t|) = 0.0128          Pr(T > t) = 0.9936

-------------------------------------------------------------------------------------------
-> b = 3

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |    1599    53.62299    .2155456     8.61913    53.20021    54.04577
     yes |     253    55.38316    .5098506     8.10967    54.37905    56.38727
---------+--------------------------------------------------------------------
combined |    1852    53.86344     .199154    8.570566    53.47285    54.25403
---------+--------------------------------------------------------------------
    diff |           -1.760173    .5786011               -2.894952   -.6253928
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -3.0421
Ho: diff = 0                                     degrees of freedom =     1850

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0012         Pr(|T| > |t|) = 0.0024          Pr(T > t) = 0.9988

-------------------------------------------------------------------------------------------
-> b = 4

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     829    56.86899    .2909555    8.377295    56.29789    57.44008
     yes |     160    57.34556    .5922284    7.491162    56.17591    58.51521
---------+--------------------------------------------------------------------
combined |     989    56.94609    .2619749    8.238684      56.432    57.46018
---------+--------------------------------------------------------------------
    diff |           -.4765759    .7116066               -1.873012    .9198599
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -0.6697
Ho: diff = 0                                     degrees of freedom =      987

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.2516         Pr(|T| > |t|) = 0.5032          Pr(T > t) = 0.7484

-------------------------------------------------------------------------------------------
-> b = 5

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     131    52.50557    .6960753    7.966946    51.12847    53.88267
     yes |      35    55.01257    1.310023    7.750203    52.35028    57.67486
---------+--------------------------------------------------------------------
combined |     166    53.03416    .6181866    7.964777    51.81358    54.25473
---------+--------------------------------------------------------------------
    diff |           -2.506999    1.507463               -5.483536    .4695381
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -1.6631
Ho: diff = 0                                     degrees of freedom =      164

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0491         Pr(|T| > |t|) = 0.0982          Pr(T > t) = 0.9509

ATT shown in Table 12.6 on page 320.

set seed 7492
atts math12 catholic, pscore(p) blockid(b) bootstrap



ATT estimation with the Stratification method
Analytical standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT   Std. Err.           t
---------------------------------------------------------

      592        5079       1.564       0.353       4.424

---------------------------------------------------------



Bootstrapping of standard errors 

command:      atts math12 catholic , pscore(p) blockid(b)
statistic:    atts       = r(atts)

Bootstrap statistics                              Number of obs    =      5671
                                                  Replications     =        50

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
        atts |    50  1.563586  .0258251  .3812768   .7973821   2.329791   (N)
             |                                       .7701139   2.242676   (P)
             |                                        .605822   2.242676  (BC)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected



ATT estimation with the Stratification method
Bootstrapped standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT   Std. Err.           t
---------------------------------------------------------

      592        5079       1.564       0.381       4.101

---------------------------------------------------------

Additional methods of controlling for propensity scores, using the model with additional covariates used to estimate propensity (i.e. using the propensity model from Table 12.6).

Method A: Controlling for block by estimating the relationship between math12 and catholic separately in each block.

sort b
statsby _b[catholic] e(N), by(b) noisily sav(Cathslopes4,replace): regress math12 catholic

statsby: First call to regress with data as is:

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  1,  5669) =   90.48
       Model |   8043.1077     1   8043.1077           Prob > F      =  0.0000
    Residual |  503934.635  5669  88.8930385           R-squared     =  0.0157
-------------+------------------------------           Adj R-squared =  0.0155
       Total |  511977.743  5670  90.2958982           Root MSE      =  9.4283

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |    3.89486   .4094621     9.51   0.000     3.092157    4.697562
       _cons |   50.64465   .1322954   382.81   0.000      50.3853      50.904
------------------------------------------------------------------------------

statsby legend:

      command:  regress math12 catholic
      _stat_1:  _b[catholic]
      _stat_2:  e(N)
           by:  b

Statsby groups

running (regress math12 catholic) on group 1

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    1123
-------------+------------------------------           F(  1,  1121) =    3.15
       Model |  182.062217     1  182.062217           Prob > F      =  0.0760
    Residual |  64716.1076  1121  57.7306936           R-squared     =  0.0028
-------------+------------------------------           Adj R-squared =  0.0019
       Total |  64898.1698  1122  57.8415061           Root MSE      =  7.5981

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.349884   1.323244     1.78   0.076    -.2464296    4.946197
       _cons |   43.66365   .2302446   189.64   0.000     43.21189     44.1154
------------------------------------------------------------------------------

running (regress math12 catholic) on group 2

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    1541
-------------+------------------------------           F(  1,  1539) =    6.21
       Model |  471.884989     1  471.884989           Prob > F      =  0.0128
    Residual |   116890.58  1539  75.9522939           R-squared     =  0.0040
-------------+------------------------------           Adj R-squared =  0.0034
       Total |  117362.465  1540  76.2093931           Root MSE      =  8.7151

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.149331   .8622945     2.49   0.013     .4579344    3.840727
       _cons |   48.85303   .2303831   212.05   0.000     48.40113    49.30493
------------------------------------------------------------------------------

running (regress math12 catholic) on group 3

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =    1852
-------------+------------------------------           F(  1,  1850) =    9.25
       Model |  676.765989     1  676.765989           Prob > F      =  0.0024
    Residual |  135287.688  1850  73.1284801           R-squared     =  0.0050
-------------+------------------------------           Adj R-squared =  0.0044
       Total |  135964.454  1851  73.4545944           Root MSE      =  8.5515

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.760173   .5786011     3.04   0.002     .6253928    2.894952
       _cons |   53.62299   .2138548   250.74   0.000     53.20357    54.04241
------------------------------------------------------------------------------

running (regress math12 catholic) on group 4

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     989
-------------+------------------------------           F(  1,   987) =    0.45
       Model |  30.4608727     1  30.4608727           Prob > F      =  0.5032
    Residual |   67030.949   987  67.9138288           R-squared     =  0.0005
-------------+------------------------------           Adj R-squared = -0.0006
       Total |  67061.4099   988   67.875921           Root MSE      =   8.241

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   .4765759   .7116066     0.67   0.503    -.9198599    1.873012
       _cons |   56.86899   .2862212   198.69   0.000     56.30731    57.43066
------------------------------------------------------------------------------

running (regress math12 catholic) on group 5

. regress math12 catholic

      Source |       SS       df       MS              Number of obs =     166
-------------+------------------------------           F(  1,   164) =    2.77
       Model |  173.595927     1  173.595927           Prob > F      =  0.0982
    Residual |  10293.6209   164  62.7659812           R-squared     =  0.0166
-------------+------------------------------           Adj R-squared =  0.0106
       Total |  10467.2168   165  63.4376779           Root MSE      =  7.9225

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   2.506999   1.507463     1.66   0.098    -.4695381    5.483536
       _cons |   52.50557   .6921919    75.85   0.000     51.13882    53.87233
------------------------------------------------------------------------------

Distribution of coefficients for catholic (predicting math12) across the propensity score blocks. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. (Not shown in text.)

preserve
use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/CathSlopes4, clear
list
histogram _stat_1, bin(3) frequency kdensity kdenopts(gaussian)
restore

     +------------------------+
     | b    _stat_1   _stat_2 |
     |------------------------|
  1. | 1   2.349884      1123 |
  2. | 2   2.149331      1541 |
  3. | 3   1.760173      1852 |
  4. | 4   .4765759       989 |
  5. | 5   2.506999       166 |
     +------------------------+

Method B: Estimate the relationship between math12 and catholic in all blocks at the same time, using fixed effects the blocks. Note that this model includes the intercept and dummy variables for blocks 2 to 6. (Not shown in text.)

xi: regress math12 catholic i.b
i.b               _Ib_1-5             (naturally coded; _Ib_1 omitted)

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  5,  5665) =  337.52
       Model |  117512.027     5  23502.4055           Prob > F      =  0.0000
    Residual |  394465.715  5665  69.6320768           R-squared     =  0.2295
-------------+------------------------------           Adj R-squared =  0.2288
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.3446

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.580998    .367602     4.30   0.000     .8603572    2.301639
       _Ib_2 |   5.206677     .32775    15.89   0.000     4.564162    5.849193
       _Ib_3 |   9.960542    .318012    31.32   0.000     9.337117    10.58397
       _Ib_4 |   13.00339   .3670815    35.42   0.000     12.28377    13.72301
       _Ib_5 |   9.013889   .6970521    12.93   0.000       7.6474    10.38038
       _cons |   43.68692   .2492575   175.27   0.000     43.19828    44.17556
------------------------------------------------------------------------------

This model can also be estimated using the factor variable syntax introduced in Stata 11. (Not shown in text.)

regress math12 catholic i.b

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  5,  5665) =  337.52
       Model |  117512.027     5  23502.4055           Prob > F      =  0.0000
    Residual |  394465.715  5665  69.6320768           R-squared     =  0.2295
-------------+------------------------------           Adj R-squared =  0.2288
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.3446

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.580998    .367602     4.30   0.000     .8603572    2.301639
             |
           b |
          2  |   5.206677     .32775    15.89   0.000     4.564162    5.849193
          3  |   9.960542    .318012    31.32   0.000     9.337117    10.58397
          4  |   13.00339   .3670815    35.42   0.000     12.28377    13.72301
          5  |   9.013889   .6970521    12.93   0.000       7.6474    10.38038
             |
       _cons |   43.68692   .2492575   175.27   0.000     43.19828    44.17556
------------------------------------------------------------------------------

An equivalent model with no intercept and a fixed effect for each block. (Not shown in text.)

xi i.b, noomit
regress math12 catholic _Ib_1-_Ib_5, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  6,  5665) =35657.50
       Model |  14897436.1     6  2482906.01           Prob > F      =  0.0000
    Residual |  394465.715  5665  69.6320768           R-squared     =  0.9742
-------------+------------------------------           Adj R-squared =  0.9742
       Total |  15291901.8  5671  2696.50887           Root MSE      =  8.3446

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.580998    .367602     4.30   0.000     .8603572    2.301639
       _Ib_1 |   43.68692   .2492575   175.27   0.000     43.19828    44.17556
       _Ib_2 |    48.8936   .2141841   228.28   0.000     48.47372    49.31348
       _Ib_3 |   53.64747   .2003001   267.84   0.000      53.2548    54.04013
       _Ib_4 |   56.69031   .2719252   208.48   0.000     56.15724    57.22339
       _Ib_5 |   52.70081   .6522864    80.79   0.000     51.42208    53.97954
------------------------------------------------------------------------------

This model can also be estimated using the factor variable syntax introduced in Stata 11. Note that the dummy variables do not need to be created using xi.

regress math12 catholic ibn.b, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  6,  5665) =35657.50
       Model |  14897436.1     6  2482906.01           Prob > F      =  0.0000
    Residual |  394465.715  5665  69.6320768           R-squared     =  0.9742
-------------+------------------------------           Adj R-squared =  0.9742
       Total |  15291901.8  5671  2696.50887           Root MSE      =  8.3446

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.580998    .367602     4.30   0.000     .8603572    2.301639
             |
           b |
          1  |   43.68692   .2492575   175.27   0.000     43.19828    44.17556
          2  |    48.8936   .2141841   228.28   0.000     48.47372    49.31348
          3  |   53.64747   .2003001   267.84   0.000      53.2548    54.04013
          4  |   56.69031   .2719252   208.48   0.000     56.15724    57.22339
          5  |   52.70081   .6522864    80.79   0.000     51.42208    53.97954
------------------------------------------------------------------------------

Method C: Controlling for propensities (as a linear effect). (Not shown in text.)

regress math12 catholic p

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  2,  5668) =  802.96
       Model |   113033.68     2  56516.8401           Prob > F      =  0.0000
    Residual |  398944.063  5668  70.3853321           R-squared     =  0.2208
-------------+------------------------------           Adj R-squared =  0.2205
       Total |  511977.743  5670  90.2958982           Root MSE      =  8.3896

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.438686   .3698603     3.89   0.000     .7136181    2.163753
           p |   82.32179   2.131477    38.62   0.000     78.14328     86.5003
       _cons |   42.30742   .2458801   172.07   0.000      41.8254    42.78944
------------------------------------------------------------------------------

Controlling for selection using nearest-neighbor matching (with random draws). Discussed on page 323. The command attnd is part of the same user-written package as pscore and atts.

attnd math12 catholic, pscore(p) comsup detail matchvar(neighbor) matchdta(pickdat3) id(id)


****************************************************************
Estimation of the ATT with the nearest neighbor matching method 
Random draw version 
****************************************************************


 Note: the common support option has been selected
 The region of common support is [.00999032, .24626373]


 The outcome is math12

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      math12 |      5617    51.14784    9.475516      29.88      71.37


 The treatment is catholic

   attended |
   catholic |
        hs? |      Freq.     Percent        Cum.
------------+-----------------------------------
         no |      5,025       89.46       89.46
        yes |        592       10.54      100.00
------------+-----------------------------------
      Total |      5,617      100.00


 The distribution of the pscore is

                 Estimated propensity score
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0133107       .0099903
 5%     .0224748       .0100709
10%     .0323934       .0101176       Obs                5617
25%     .0604995       .0102946       Sum of Wgt.        5617

50%      .107828                      Mean           .1049281
                        Largest       Std. Dev.      .0522626
75%      .145359       .2322851
90%     .1742154       .2384602       Variance       .0027314
95%     .1851969       .2391276       Skewness       .0475229
99%     .2127406       .2462637       Kurtosis       2.030977


 The program is searching the nearest neighbor of each treated unit. 
 This operation may take a while.

 **************************************************** 
 Forward search
 

 **************************************************** 
 Backward search
 


 **************************************************** 
 Choice between backward or forward match
 


**************************************************** 
Display of final results 
**************************************************** 


The number of treated is
  592


The number of treated which have been matched is 
  592


Average absolute pscore difference between treated and controls

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       PSDIF |       592    .0000432    .0003268          0   .0078036


Average outcome of the matched treated

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      math12 |       592    54.53951    8.463153      32.92      71.08
 


Average outcome of the matched controls

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
      math12 |     553         592    53.61822   8.913623      32.05      70.79
(553 real changes made)
(592 real changes made)



ATT estimation with Nearest Neighbor Matching method 
(random draw version)
Analytical standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT    Std. Err.          t
---------------------------------------------------------

      592         553       0.921        0.537      1.716

---------------------------------------------------------
Note: the numbers of treated and controls refer to actual
nearest neighbour matches



***************************************************************************** 
End of the estimation with the nearest neighbor matching (random draw) method 
*****************************************************************************

Inspect the neighbors based on the above model and estimate ATE. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. The syntax is shown twice, once as a single block, then with the output. (Discussed on page 323.)

* Syntax alone
preserve
 use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/pickdat3, clear
 merge id using "https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic.dta" , ///
 	unique update
 sort p catholic
 list id p catholic faminc8 math8 fhowfar mhowfar fight8 nohw8 disrupt8 riskdrop8 ///
 	if p<.012 & neighbor==1
 * Estimate ATE directly
 ttest math12 if neighbor==1, by(catholic)
restore


* Syntax with output
preserve
use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/pickdat3, clear
merge id using "https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic.dta" , ///
	unique update
id was float now double

sort p catholic
list id p catholic faminc8 math8 fhowfar mhowfar fight8 nohw8 disrupt8 riskdrop8 ///
	if p<.012 & neighbor==1
	
      +----------------------------------------------------------------------------------+
   1. |      id |        p | catholic |       faminc8 | math8 |    fhowfar |     mhowfar |
      | 1485802 | .0099903 |      yes |   $3000-$4999 | 42.02 |    coll <4 |  postsec ed |
      |----------------------------------------------------------------------------------|
      |      fight8       |      nohw8       |      disrupt8       |      riskdr~8       |
      |       never       |        yes       |            no       |             3       |
      +----------------------------------------------------------------------------------+

      +----------------------------------------------------------------------------------+
   2. |      id |        p | catholic |       faminc8 | math8 |    fhowfar |     mhowfar |
      |  709436 | .0100709 |       no |   $3000-$4999 | 52.16 |    hs grad |     hs grad |
      |----------------------------------------------------------------------------------|
      |      fight8       |      nohw8       |      disrupt8       |      riskdr~8       |
      |       never       |        yes       |            no       |             2       |
      +----------------------------------------------------------------------------------+

      +----------------------------------------------------------------------------------+
  14. |      id |        p | catholic |       faminc8 | math8 |    fhowfar |     mhowfar |
      | 6873825 | .0111274 |       no |   $5000-$7499 | 39.05 | postsec ed |  postsec ed |
      |----------------------------------------------------------------------------------|
      |      fight8       |      nohw8       |      disrupt8       |      riskdr~8       |
      |       never       |        yes       |            no       |             4       |
      +----------------------------------------------------------------------------------+

      +----------------------------------------------------------------------------------+
  15. |      id |        p | catholic |       faminc8 | math8 |    fhowfar |     mhowfar |
      | 1485892 |  .011172 |      yes | $10000-$14999 | 42.36 |    hs grad | junior coll |
      |----------------------------------------------------------------------------------|
      |      fight8       |      nohw8       |      disrupt8       |      riskdr~8       |
      |       never       |        yes       |            no       |             2       |
      +----------------------------------------------------------------------------------+

* Estimate ATE directly
ttest math12 if neighbor==1, by(catholic)

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |     553    53.50092    .3811649    8.963455    52.75221    54.24963
     yes |     592    54.53951    .3478334    8.463153    53.85637    55.22265
---------+--------------------------------------------------------------------
combined |    1145     54.0379    .2577003    8.720023    53.53229    54.54352
---------+--------------------------------------------------------------------
    diff |           -1.038588    .5150099               -2.049059   -.0281169
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.0166
Ho: diff = 0                                     degrees of freedom =     1143

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0220         Pr(|T| > |t|) = 0.0440          Pr(T > t) = 0.9780

restore

Model estimated using inverse propensity score weighting, discussed starting on page 327. Inverse probability weights are calculated based on the propensity scores for the previous model.

gen pscorewgt=1/p
replace pscorewgt=1/(1-p) if catholic==0
(5079 real changes made)

Estimate the ATE using the Imbens and Wooldridge method detailed in footnote 29 on page 327. Estimating the ATE "by hand."

gen pmath12=pscorewgt*math12
total pmath12 if catholic==0

Total estimation                    Number of obs    =    5079

--------------------------------------------------------------
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     pmath12 |     288700   894.6781      286946.1      290454
--------------------------------------------------------------

total pmath12 if catholic==1

Total estimation                    Number of obs    =     592

--------------------------------------------------------------
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     pmath12 |   297285.7   8240.164      281102.1    313469.3
--------------------------------------------------------------

total pscorewgt if catholic==0

Total estimation                    Number of obs    =    5079

--------------------------------------------------------------
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
   pscorewgt |   5671.273   4.753054      5661.955    5680.591
--------------------------------------------------------------

total pscorewgt if catholic==1

Total estimation                    Number of obs    =     592

--------------------------------------------------------------
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
   pscorewgt |   5675.911   192.6692      5297.512    6054.311
--------------------------------------------------------------

* calculate the ATE
display 297285.7/5675.911 - 288700/5671.273 
1.4710589

Estimate the ATE using analytic weights. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)

sum math12 if catholic==0 [aw=pscorewgt]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
      math12 |    5079  5671.27272    50.90568   9.525339      29.88      71.37

sum math12 if catholic==1 [aw=pscorewgt]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
      math12 |     592   5675.9113    52.37674    9.03198      32.92      71.08

display 52.37674-50.90568
1.47106

Estimate the ATE using WLS with analytic weights. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)

regress math12 catholic [aw=pscorewgt]
(sum of wgt is   1.1347e+04)

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  1,  5669) =   35.63
       Model |  3068.00754     1  3068.00754           Prob > F      =  0.0000
    Residual |  488129.279  5669   86.105006           R-squared     =  0.0062
-------------+------------------------------           Adj R-squared =  0.0061
       Total |  491197.287  5670  86.6309148           Root MSE      =  9.2793

------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    catholic |   1.471053   .2464418     5.97   0.000     .9879331    1.954174
       _cons |   50.90568   .1742963   292.06   0.000       50.564    51.24737
------------------------------------------------------------------------------

Estimate the ATE using WLS by transformation. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)

gen w=sqrt(pscorewgt)
gen wmath12 = w*math12
gen wcatholic = w*catholic
regress wmath12 w wcatholic, noconstant

      Source |       SS       df       MS              Number of obs =    5671
-------------+------------------------------           F(  2,  5669) =87838.97
       Model |  30267327.8     2  15133663.9           Prob > F      =  0.0000
    Residual |  976704.778  5669  172.288724           R-squared     =  0.9687
-------------+------------------------------           Adj R-squared =  0.9687
       Total |  31244032.6  5671  5509.43971           Root MSE      =  13.126

------------------------------------------------------------------------------
     wmath12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |   50.90568   .1742963   292.06   0.000       50.564    51.24737
   wcatholic |   1.471053   .2464418     5.97   0.000      .987933    1.954173
------------------------------------------------------------------------------

Kernel-density of inc8 without weighting. (Note shown in text.)

twoway kdensity inc8 if catholic==1 || ///
	kdensity inc8 if catholic==0, ///
	legend(off) scheme(lean1)

Kernel-density of inc8 with inverse propensity weighting. (Note shown in text.)

twoway kdensity inc8 if catholic==1 [aw=pscorewgt] || ///
	kdensity inc8 if catholic==0 [aw=pscorewgt], ///
	legend(off) scheme(lean1)

Figure 13.3, Part A on page 327.

twoway kdensity math8 if catholic==1 || ///
	kdensity math8 if catholic==0, ///
	legend(off) scheme(lean1)

Figure 13.3, Part B on page 327.

twoway kdensity math8 if catholic==1 [aw=pscorewgt] || ///
	kdensity math8 if catholic==0 [aw=pscorewgt], ///
	legend(off) scheme(lean1)