use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic, clear
Descriptive statistics for mathematics score (math12) and type of high school (catholic). Note: this output does not appear in the text.
sum math12 catholic, detail 12th grade standardized mathematics score ------------------------------------------------------------- Percentiles Smallest 1% 32.88 29.88 5% 35.46 30.14 10% 37.54 30.42 Obs 5671 25% 43.53 30.55 Sum of Wgt. 5671 50% 51.33 Mean 51.05124 Largest Std. Dev. 9.502415 75% 58.61 70.94 90% 63.67 71.08 Variance 90.2959 95% 65.98 71.12 Skewness -.0567201 99% 69.33 71.37 Kurtosis 2.072073 attended catholic hs? ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 5671 25% 0 0 Sum of Wgt. 5671 50% 0 Mean .1043908 Largest Std. Dev. .3057938 75% 0 1 90% 1 1 Variance .0935098 95% 1 1 Skewness 2.587653 99% 1 1 Kurtosis 7.69595 table catholic, contents(mean math12 sd math12 freq) ---------------------------------------------------- attended | catholic | hs? | mean(math12) sd(math12) Freq. ----------+----------------------------------------- no | 50.64465 9.534295 5,079 yes | 54.53951 8.463153 592 ----------------------------------------------------
Descriptive statistics for family income (faminc8). (Not shown in text.)
sum faminc8, detail total annual family income in 8th grade ------------------------------------------------------------- Percentiles Smallest 1% 2 1 5% 5 1 10% 7 1 Obs 5671 25% 8 1 Sum of Wgt. 5671 50% 10 Mean 9.526186 Largest Std. Dev. 2.217688 75% 11 12 90% 12 12 Variance 4.918141 95% 12 12 Skewness -1.268464 99% 12 12 Kurtosis 4.447905
Various methods of examining the relationship between catholic and faminc8. (Not shown in text.)
by catholic, sort: sum faminc8, detail ------------------------------------------------------------------------------------------------------ -> catholic = no total annual family income in 8th grade ------------------------------------------------------------- Percentiles Smallest 1% 2 1 5% 5 1 10% 6 1 Obs 5079 25% 8 1 Sum of Wgt. 5079 50% 10 Mean 9.428825 Largest Std. Dev. 2.25239 75% 11 12 90% 12 12 Variance 5.073261 95% 12 12 Skewness -1.214205 99% 12 12 Kurtosis 4.255522 ------------------------------------------------------------------------------------------------------ -> catholic = yes total annual family income in 8th grade ------------------------------------------------------------- Percentiles Smallest 1% 4 1 5% 7 2 10% 8 4 Obs 592 25% 10 4 Sum of Wgt. 592 50% 11 Mean 10.36149 Largest Std. Dev. 1.67728 75% 11 12 90% 12 12 Variance 2.813269 95% 12 12 Skewness -1.784059 99% 12 12 Kurtosis 7.343344 tab faminc8 catholic, chi2 total annual | family income | attended catholic hs? in 8th grade | no yes | Total ----------------+----------------------+---------- none | 17 1 | 18 <$1000 | 41 1 | 42 $1000-$2999 | 84 0 | 84 $3000-$4999 | 79 6 | 85 $5000-$7499 | 138 6 | 144 7500-$9999 | 169 6 | 175 $10000-$14999 | 427 20 | 447 $15000-$19999 | 410 31 | 441 $20000-$24999 | 608 47 | 655 $25000-$34999 | 1,137 130 | 1,267 35000-$49999 | 1,221 198 | 1,419 50000-$74999 | 748 146 | 894 ----------------+----------------------+---------- Total | 5,079 592 | 5,671 Pearson chi2(11) = 111.4057 Pr = 0.000 pwcorr faminc8 catholic, sig | faminc8 catholic -------------+------------------ faminc8 | 1.0000 | | catholic | 0.1286 1.0000 | 0.0000 |
Categorize faminc8 into catfaminc8, and examine the relationship between the two variables. (Not shown in text.)
egen catfaminc8=cut(faminc8), at(1,9,11,13) icodes tab catfaminc8 catfaminc8 | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,436 25.32 25.32 1 | 1,922 33.89 59.21 2 | 2,313 40.79 100.00 ------------+----------------------------------- Total | 5,671 100.00 tab faminc8 catfaminc8 total annual | family income | catfaminc8 in 8th grade | 0 1 2 | Total ----------------+---------------------------------+---------- none | 18 0 0 | 18 <$1000 | 42 0 0 | 42 $1000-$2999 | 84 0 0 | 84 $3000-$4999 | 85 0 0 | 85 $5000-$7499 | 144 0 0 | 144 7500-$9999 | 175 0 0 | 175 $10000-$14999 | 447 0 0 | 447 $15000-$19999 | 441 0 0 | 441 $20000-$24999 | 0 655 0 | 655 $25000-$34999 | 0 1,267 0 | 1,267 35000-$49999 | 0 0 1,419 | 1,419 50000-$74999 | 0 0 894 | 894 ----------------+---------------------------------+---------- Total | 1,436 1,922 2,313 | 5,671
Table 12.1 on page 293.
* Sample variance of faminc8 in each income category. tabstat faminc8, by(catfaminc8) statistics(var) Summary for variables: faminc8 by categories of: catfaminc8 catfaminc8 | variance -----------+---------- 0 | 3.063001 1 | .2247694 2 | .2372228 -----------+---------- Total | 4.918141 ---------------------- * Sample mean of faminc8 by income category and school type. table catfaminc8 catholic, contents(mean faminc8) ------------------------------------------ catfaminc | attended catholic hs? 8 | no yes ----------+------------------------------- 0 | 6.32967042923 6.774647712708 1 | 9.651576042175 9.734463691711 2 | 11.37988853455 11.4244184494 ------------------------------------------ * Tests for differences in family income by school type within each income category. by catfaminc8, sort : ttest faminc8, by(catholic) ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1365 6.32967 .0475499 1.756773 6.236392 6.422949 yes | 71 6.774648 .1862445 1.569324 6.403195 7.146101 ---------+-------------------------------------------------------------------- combined | 1436 6.351671 .0461845 1.750143 6.261075 6.442268 ---------+-------------------------------------------------------------------- diff | -.4449776 .2127872 -.8623851 -.0275701 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.0912 Ho: diff = 0 degrees of freedom = 1434 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0183 Pr(|T| > |t|) = 0.0367 Pr(T > t) = 0.9817 ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1745 9.651576 .0114094 .4766077 9.629198 9.673954 yes | 177 9.734463 .0332883 .4428714 9.668768 9.800159 ---------+-------------------------------------------------------------------- combined | 1922 9.659209 .0108141 .4740985 9.638 9.680418 ---------+-------------------------------------------------------------------- diff | -.0828873 .037361 -.1561597 -.009615 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.2186 Ho: diff = 0 degrees of freedom = 1920 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0133 Pr(|T| > |t|) = 0.0266 Pr(T > t) = 0.9867 ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1969 11.37989 .0109408 .4854821 11.35843 11.40135 yes | 344 11.42442 .0266872 .4949744 11.37193 11.47691 ---------+-------------------------------------------------------------------- combined | 2313 11.38651 .0101272 .4870552 11.36665 11.40637 ---------+-------------------------------------------------------------------- diff | -.0445303 .028453 -.1003264 .0112657 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -1.5650 Ho: diff = 0 degrees of freedom = 2311 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0589 Pr(|T| > |t|) = 0.1177 Pr(T > t) = 0.9411 tab catfaminc8 catholic, row +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | attended catholic hs? catfaminc8 | no yes | Total -----------+----------------------+---------- 0 | 1,365 71 | 1,436 | 95.06 4.94 | 100.00 -----------+----------------------+---------- 1 | 1,745 177 | 1,922 | 90.79 9.21 | 100.00 -----------+----------------------+---------- 2 | 1,969 344 | 2,313 | 85.13 14.87 | 100.00 -----------+----------------------+---------- Total | 5,079 592 | 5,671 | 89.56 10.44 | 100.00 * Average math achievement, by school type and income category. table catfaminc8 catholic, contents(mean math12) ------------------------------ | attended catholic catfaminc | hs? 8 | no yes ----------+------------------- 0 | 46.77358 50.53563 1 | 50.33842 53.85616 2 | 53.59964 55.7175 ------------------------------ * Tests for differences in average math achievement by school type within each income category. by catfaminc8, sort : ttest math12, by(catholic) ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1365 46.77358 .2409728 8.90296 46.30086 47.2463 yes | 71 50.53563 1.003933 8.459293 48.53335 52.53792 ---------+-------------------------------------------------------------------- combined | 1436 46.95959 .2352876 8.916128 46.49804 47.42113 ---------+-------------------------------------------------------------------- diff | -3.762051 1.081144 -5.882845 -1.641258 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -3.4797 Ho: diff = 0 degrees of freedom = 1434 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0003 Pr(|T| > |t|) = 0.0005 Pr(T > t) = 0.9997 ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1745 50.33842 .2228944 9.311012 49.90126 50.77559 yes | 177 53.85616 .6445502 8.575183 52.58412 55.1282 ---------+-------------------------------------------------------------------- combined | 1922 50.66238 .2121188 9.299418 50.24637 51.07838 ---------+-------------------------------------------------------------------- diff | -3.517734 .7293671 -4.948169 -2.087299 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -4.8230 Ho: diff = 0 degrees of freedom = 1920 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1969 53.59964 .2060271 9.142124 53.19559 54.00369 yes | 344 55.7175 .4384348 8.131754 54.85514 56.57986 ---------+-------------------------------------------------------------------- combined | 2313 53.91462 .1877359 9.028905 53.54647 54.28277 ---------+-------------------------------------------------------------------- diff | -2.117861 .5258916 -3.149129 -1.086592 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -4.0272 Ho: diff = 0 degrees of freedom = 2311 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 1.0000
Figure 12.1 on page 297.
sort catholic catfaminc8 by catholic catfaminc8: egen n = count(id) by catholic catfaminc8: egen mmath12 = mean(math12) twoway (scatter mmath12 catholic [aweight=n] if catfaminc8==0, connect(l) msymbol(S)) /// (scatter mmath12 catholic [aweight=n] if catfaminc8==1, connect(l) msymbol(S)) /// (scatter mmath12 catholic [aweight=n] if catfaminc8==2, connect(l) msymbol(S)) /// (lfit math12 catholic [aweight=n]), /// xlabel(0 "Public" 1 "Catholic") xscale(range(-.25 1.25)) /// legend(label(1 `"faminc8 is "Lo""') label(2 `"faminc8 is "Med""') /// label(3 `"faminc8 is "Hi""') label(4 `"Unstratified"')) /// xtitle("Type of High School") ytitle("12rh Grade Mathematics Achievement") /// scheme(s2mono)
A simplified graph that provides information similar to that in Figure 12.1 can be produced using the syntax shown below. (Not shown in the text.)
twoway (lfit math12 catholic if catfaminc8==0) /// (lfit math12 catholic if catfaminc8==1) /// (lfit math12 catholic if catfaminc8==2) /// (lfit math12 catholic), /// xlabel(0 "Public" 1 "Catholic") xscale(range(-.25 1.25)) /// legend(label(1 `"faminc8 is "Lo""') label(2 `"faminc8 is "Med""') /// label(3 `"faminc8 is "Hi""') label(4 `"Unstratified"')) /// xtitle("Type of High School") ytitle("12rh Grade Mathematics Achievement") /// scheme(s2mono)
OLS regression model of math12 on catholic. This regression corresponds to the "Unstratified" line in Figure 12.1. (Not shown in the text.)
regress math12 catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 1, 5669) = 90.48 Model | 8043.1077 1 8043.1077 Prob > F = 0.0000 Residual | 503934.635 5669 88.8930385 R-squared = 0.0157 -------------+------------------------------ Adj R-squared = 0.0155 Total | 511977.743 5670 90.2958982 Root MSE = 9.4283 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.89486 .4094621 9.51 0.000 3.092157 4.697562 _cons | 50.64465 .1322954 382.81 0.000 50.3853 50.904 ------------------------------------------------------------------------------
OLS regression of math12 on catholic, stratifying by catfaminc8. These regressions correspond to the information shown in Figure 12.1. (Not shown in text.)
by catfaminc8, sort: regress math12 catholic ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0 Source | SS df MS Number of obs = 1436 -------------+------------------------------ F( 1, 1434) = 12.11 Model | 955.181769 1 955.181769 Prob > F = 0.0005 Residual | 113123.499 1434 78.8866802 R-squared = 0.0084 -------------+------------------------------ Adj R-squared = 0.0077 Total | 114078.681 1435 79.4973388 Root MSE = 8.8818 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.762051 1.081144 3.48 0.001 1.641258 5.882845 _cons | 46.77358 .2404006 194.57 0.000 46.30201 47.24516 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 1 Source | SS df MS Number of obs = 1922 -------------+------------------------------ F( 1, 1920) = 23.26 Model | 1988.57183 1 1988.57183 Prob > F = 0.0000 Residual | 164137.924 1920 85.4885019 R-squared = 0.0120 -------------+------------------------------ Adj R-squared = 0.0115 Total | 166126.496 1921 86.4791752 Root MSE = 9.246 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.517734 .7293671 4.82 0.000 2.087299 4.948169 _cons | 50.33842 .2213381 227.43 0.000 49.90434 50.77251 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 2 Source | SS df MS Number of obs = 2313 -------------+------------------------------ F( 1, 2311) = 16.22 Model | 1313.47946 1 1313.47946 Prob > F = 0.0001 Residual | 187163.381 2311 80.9880488 R-squared = 0.0070 -------------+------------------------------ Adj R-squared = 0.0065 Total | 188476.86 2312 81.5211333 Root MSE = 8.9993 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.117861 .5258916 4.03 0.000 1.086592 3.149129 _cons | 53.59964 .2028092 264.29 0.000 53.20193 53.99735 ------------------------------------------------------------------------------
Descriptive statistics for math achievement (math8). (Not shown in text.)
sum math8, detail 8th grade standardized mathematics score ------------------------------------------------------------- Percentiles Smallest 1% 35.95 34.48 5% 37.89 34.49 10% 39.42 34.52 Obs 5671 25% 43.45 34.52 Sum of Wgt. 5671 50% 50.45 Mean 51.48952 Largest Std. Dev. 9.683425 75% 58.56 77.2 90% 65.39 77.2 Variance 93.76872 95% 68.89 77.2 Skewness .4078902 99% 74.04 77.2 Kurtosis 2.319295
Several methods of examining the relationship between math8 and catholic. (Not shown in text.)
corr math8 catholic (obs=5671) | math8 catholic -------------+------------------ math8 | 1.0000 catholic | 0.0765 1.0000 ttest math8, by(catholic) Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 5079 51.23648 .1367773 9.747724 50.96834 51.50462 yes | 592 53.66039 .3628002 8.82731 52.94785 54.37292 ---------+-------------------------------------------------------------------- combined | 5671 51.48952 .1285876 9.683425 51.23743 51.7416 ---------+-------------------------------------------------------------------- diff | -2.423907 .4193447 -3.245983 -1.601831 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -5.7802 Ho: diff = 0 degrees of freedom = 5669 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
Create a categorical variable for prior math achievement (catmath8), and examine the relationship between cathmath8 and math8. (Not shown in text.)
egen catmath8=cut(math8), at(30,38,44,51,80) icodes tab catmath8 catmath8 | Freq. Percent Cum. ------------+----------------------------------- 0 | 304 5.36 5.36 1 | 1,236 21.80 27.16 2 | 1,421 25.06 52.21 3 | 2,710 47.79 100.00 ------------+----------------------------------- Total | 5,671 100.00 table catmath8, contents(mean math8 sd math8 freq) ------------------------------------------------- catmath8 | mean(math8) sd(math8) Freq. ----------+-------------------------------------- 0 | 36.78859 .8564365 304 1 | 41.10199 1.722423 1,236 2 | 47.53923 2.045117 1,421 3 | 59.9476 6.27689 2,710 -------------------------------------------------
Check for balance in math8 within strata (catmath8), by catholic. (Not shown in the text.)
table catmath8 catholic, contents(mean math8 sd math8 freq) ------------------------------ | attended catholic | hs? catmath8 | no yes ----------+------------------- 0 | 36.80332 36.30556 | .8559109 .7666504 | 295 9 | 1 | 41.09058 41.2438 | 1.718102 1.778788 | 1,144 92 | 2 | 47.49826 47.92955 | 2.040288 2.057497 | 1,286 135 | 3 | 60.01815 59.48112 | 6.348762 5.765806 | 2,354 356 ------------------------------ by catmath8, sort : ttest math8, by(catholic) ------------------------------------------------------------------------------------------------------ -> catmath8 = 0 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 295 36.80332 .0498331 .8559109 36.70525 36.9014 yes | 9 36.30556 .2555501 .7666504 35.71626 36.89486 ---------+-------------------------------------------------------------------- combined | 304 36.78859 .04912 .8564365 36.69193 36.88525 ---------+-------------------------------------------------------------------- diff | .4977666 .2888636 -.0706738 1.066207 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = 1.7232 Ho: diff = 0 degrees of freedom = 302 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.9571 Pr(|T| > |t|) = 0.0859 Pr(T > t) = 0.0429 ------------------------------------------------------------------------------------------------------ -> catmath8 = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1144 41.09059 .0507967 1.718102 40.99092 41.19025 yes | 92 41.2438 .1854515 1.778788 40.87543 41.61218 ---------+-------------------------------------------------------------------- combined | 1236 41.10199 .0489926 1.722423 41.00587 41.19811 ---------+-------------------------------------------------------------------- diff | -.1532187 .1866807 -.5194654 .213028 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -0.8208 Ho: diff = 0 degrees of freedom = 1234 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2060 Pr(|T| > |t|) = 0.4119 Pr(T > t) = 0.7940 ------------------------------------------------------------------------------------------------------ -> catmath8 = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1286 47.49826 .0568946 2.040288 47.38664 47.60987 yes | 135 47.92956 .1770811 2.057497 47.57932 48.27979 ---------+-------------------------------------------------------------------- combined | 1421 47.53923 .0542527 2.045117 47.43281 47.64566 ---------+-------------------------------------------------------------------- diff | -.4312974 .1847346 -.7936796 -.0689152 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.3347 Ho: diff = 0 degrees of freedom = 1419 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0098 Pr(|T| > |t|) = 0.0197 Pr(T > t) = 0.9902 ------------------------------------------------------------------------------------------------------ -> catmath8 = 3 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 2354 60.01815 .1308536 6.348762 59.76155 60.27475 yes | 356 59.48112 .3055871 5.765806 58.88013 60.08211 ---------+-------------------------------------------------------------------- combined | 2710 59.9476 .1205757 6.27689 59.71117 60.18403 ---------+-------------------------------------------------------------------- diff | .5370243 .3568614 -.1627239 1.236773 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = 1.5049 Ho: diff = 0 degrees of freedom = 2708 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.9338 Pr(|T| > |t|) = 0.1325 Pr(T > t) = 0.0662
Table 12.1 on page 301.
table catmath8 catholic , contents(mean math12 freq) by(catfaminc8) ------------------------------ catfaminc | attended catholic 8 and | hs? catmath8 | no yes ----------+------------------- 0 | 0 | 36.80514 42.57 | 142 1 | 1 | 40.99247 41.7019 | 433 21 | 2 | 47.12156 48.65308 | 385 13 | 3 | 56.11869 56.58972 | 405 36 ----------+------------------- 1 | 0 | 37.94156 39.775 | 96 2 | 1 | 41.92456 44.56454 | 390 33 | 2 | 47.9487 50.13551 | 469 49 | 3 | 57.41727 59.41634 | 790 93 ----------+------------------- 2 | 0 | 39.78667 40.40334 | 57 6 | 1 | 42.7458 44.22737 | 321 38 | 2 | 49.17894 50.70644 | 432 73 | 3 | 58.93283 59.65723 | 1,159 227 ------------------------------
The t-tests shown in Table 12.1 on page 301 can be reproduced using the following syntax. (Note: most of the output was omitted to save space.)
bysort catfaminc8 catmath8: ttest math12, by(catholic) ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0, catmath8 = 0 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 142 36.80514 .3391017 4.040863 36.13476 37.47552 yes | 1 42.57 . . . . ---------+-------------------------------------------------------------------- combined | 143 36.84545 . . . . ---------+-------------------------------------------------------------------- diff | -5.764859 . . . ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = . Ho: diff = 0 degrees of freedom = 141 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = . Pr(|T| > |t|) = . Pr(T > t) = . ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0, catmath8 = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 433 40.99247 .2466134 5.131692 40.50776 41.47718 yes | 21 41.7019 1.018852 4.668968 39.57662 43.82719 ---------+-------------------------------------------------------------------- combined | 454 41.02529 .2397602 5.108636 40.55411 41.49647 ---------+-------------------------------------------------------------------- diff | -.7094334 1.142284 -2.954279 1.535412 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -0.6211 Ho: diff = 0 degrees of freedom = 452 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2674 Pr(|T| > |t|) = 0.5349 Pr(T > t) = 0.7326 ------------------------------------------------------------------------------------------------------ -> catfaminc8 = 0, catmath8 = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 385 47.12156 .2927101 5.743387 46.54604 47.69707 yes | 13 48.65308 1.413799 5.097526 45.57267 51.73348 ---------+-------------------------------------------------------------------- combined | 398 47.17158 .2869264 5.724165 46.6075 47.73567 ---------+-------------------------------------------------------------------- diff | -1.531519 1.614382 -4.70535 1.642312 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -0.9487 Ho: diff = 0 degrees of freedom = 396 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.1717 Pr(|T| > |t|) = 0.3434 Pr(T > t) = 0.8283
Estimate the relationship between math12 and catholic separately in each of the strata (catfaminc8 and catmath8) and save the results to a new dataset (cathslopes2.dta). (Note: this output does not appear in the text and most of the output was omitted to save space.)
sort catfaminc8 catmath8 statsby diff=_b[catholic] n=e(N), by(catfaminc8 catmath8) noisily sav(cathslopes2, replace): /// regress math12 catholic statsby: First call to regress with data as is: . regress math12 catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 1, 5669) = 90.48 Model | 8043.1077 1 8043.1077 Prob > F = 0.0000 Residual | 503934.635 5669 88.8930385 R-squared = 0.0157 -------------+------------------------------ Adj R-squared = 0.0155 Total | 511977.743 5670 90.2958982 Root MSE = 9.4283 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.89486 .4094621 9.51 0.000 3.092157 4.697562 _cons | 50.64465 .1322954 382.81 0.000 50.3853 50.904 ------------------------------------------------------------------------------ statsby legend: command: regress math12 catholic diff: _b[catholic] n: e(N) by: catfaminc8 catmath8 Statsby groups running (regress math12 catholic) on group 1 . regress math12 catholic Source | SS df MS Number of obs = 143 -------------+------------------------------ F( 1, 141) = 2.02 Model | 33.0011957 1 33.0011957 Prob > F = 0.1573 Residual | 2302.32862 141 16.3285718 R-squared = 0.0141 -------------+------------------------------ Adj R-squared = 0.0071 Total | 2335.32981 142 16.4459846 Root MSE = 4.0409 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 5.764859 4.055066 1.42 0.157 -2.251729 13.78145 _cons | 36.80514 .3391017 108.54 0.000 36.13476 37.47552 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 2 . regress math12 catholic Source | SS df MS Number of obs = 454 -------------+------------------------------ F( 1, 452) = 0.39 Model | 10.0803278 1 10.0803278 Prob > F = 0.5349 Residual | 11812.3885 452 26.1336029 R-squared = 0.0009 -------------+------------------------------ Adj R-squared = -0.0014 Total | 11822.4689 453 26.0981652 Root MSE = 5.1121 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | .7094334 1.142284 0.62 0.535 -1.535412 2.954279 _cons | 40.99247 .245672 166.86 0.000 40.50967 41.47527 ------------------------------------------------------------------------------
Graph the resulting slopes. Note that the entire block of syntax should be run and once. (Not shown in the text.)
preserve use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/cathslopes2, clear list histogram diff, bin(6) frequency kdensity kdenopts(gaussian) restore +---------------------------------------+ | catfam~8 catmath8 diff n | |---------------------------------------| 1. | 0 0 5.764859 143 | 2. | 0 1 .7094334 454 | 3. | 0 2 1.531519 398 | 4. | 0 3 .471031 441 | 5. | 1 0 1.833437 98 | |---------------------------------------| 6. | 1 1 2.639981 423 | 7. | 1 2 2.186811 518 | 8. | 1 3 1.999078 883 | 9. | 2 0 .6166673 63 | 10. | 2 1 1.481574 359 | |---------------------------------------| 11. | 2 2 1.527503 505 | 12. | 2 3 .7243947 1386 | +---------------------------------------+
Similar to model A from Table 12.3 on page 306, but with dummy variables representing the catfaminc8 by catmath8 interaction (with one group omitted as the reference category). (Not shown in text.)
xi: regress math12 catholic i.catfaminc8*i.catmath8 i.catfaminc8 _Icatfaminc_0-2 (naturally coded; _Icatfaminc_0 omitted) i.catmath8 _Icatmath8_0-3 (naturally coded; _Icatmath8_0 omitted) i.ca~c8*i.ca~h8 _IcatXcat_#_# (coded as above) Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 12, 5658) = 710.06 Model | 307674.539 12 25639.5449 Prob > F = 0.0000 Residual | 204303.204 5658 36.1087317 R-squared = 0.6010 -------------+------------------------------ Adj R-squared = 0.6001 Total | 511977.743 5670 90.2958982 Root MSE = 6.0091 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.328632 .2639492 5.03 0.000 .8111899 1.846073 _Icatfamin~1 | 1.115701 .7880213 1.42 0.157 -.4291226 2.660525 _Icatfamin~2 | 2.882697 .9089585 3.17 0.002 1.10079 4.664604 _Icatmath8_1 | 4.127667 .5763251 7.16 0.000 2.997848 5.257485 _Icatmath8_2 | 10.29202 .585901 17.57 0.000 9.143432 11.44061 _Icatmath8_3 | 19.21252 .5785983 33.21 0.000 18.07825 20.34679 _IcatXca~1_1 | -.0526632 .8865024 -0.06 0.953 -1.790548 1.685221 _IcatXca~1_2 | -.2140083 .8840602 -0.24 0.809 -1.947105 1.519089 _IcatXca~1_3 | .3234948 .8624064 0.38 0.708 -1.367152 2.014142 _IcatXca~2_1 | -1.084544 1.002914 -1.08 0.280 -3.050639 .8815522 _IcatXca~2_2 | -.8031993 .9939466 -0.81 0.419 -2.751716 1.145317 _IcatXca~2_3 | -.0975123 .9662284 -0.10 0.920 -1.99169 1.796666 _cons | 36.83616 .5025057 73.30 0.000 35.85106 37.82127 ------------------------------------------------------------------------------
The above model can also be specified using the factor variable syntax introduced in Stata 11.
regress math12 catholic catfaminc8##catmath8 Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 12, 5658) = 710.06 Model | 307674.539 12 25639.5449 Prob > F = 0.0000 Residual | 204303.204 5658 36.1087317 R-squared = 0.6010 -------------+------------------------------ Adj R-squared = 0.6001 Total | 511977.743 5670 90.2958982 Root MSE = 6.0091 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.328632 .2639492 5.03 0.000 .8111899 1.846073 | catfaminc8 | 1 | 1.115701 .7880213 1.42 0.157 -.4291226 2.660525 2 | 2.882697 .9089585 3.17 0.002 1.10079 4.664604 | catmath8 | 1 | 4.127667 .5763251 7.16 0.000 2.997848 5.257485 2 | 10.29202 .585901 17.57 0.000 9.143432 11.44061 3 | 19.21252 .5785983 33.21 0.000 18.07825 20.34679 | catfaminc8#| catmath8 | 1 1 | -.0526632 .8865024 -0.06 0.953 -1.790548 1.685221 1 2 | -.2140083 .8840602 -0.24 0.809 -1.947105 1.519089 1 3 | .3234948 .8624064 0.38 0.708 -1.367152 2.014142 2 1 | -1.084544 1.002914 -1.08 0.280 -3.050639 .8815522 2 2 | -.8031993 .9939466 -0.81 0.419 -2.751716 1.145317 2 3 | -.0975123 .9662284 -0.10 0.920 -1.99169 1.796666 | _cons | 36.83616 .5025057 73.30 0.000 35.85106 37.82127 ------------------------------------------------------------------------------
Table 12.3 on page 306, the Stratified, Fully Crossed model. Note the noomit option of the xi command is used so that a full set of dummy variables is created (i.e. one for each category). Then the constant is suppressed (i.e. noconstant) so that all dummy variables can be included.
xi i.catfaminc8*i.catmath8, noomit i.ca~c8*i.ca~h8 _IcatXcat_#_# (coded as above) regress math12 catholic _IcatXcat_0_0-_IcatXcat_2_3, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 13, 5658) =32141.38 Model | 15087598.6 13 1160584.51 Prob > F = 0.0000 Residual | 204303.204 5658 36.1087317 R-squared = 0.9866 -------------+------------------------------ Adj R-squared = 0.9866 Total | 15291901.8 5671 2696.50887 Root MSE = 6.0091 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.328632 .2639492 5.03 0.000 .8111899 1.846073 _IcatXca~0_0 | 36.83616 .5025057 73.30 0.000 35.85106 37.82127 _IcatXca~0_1 | 40.96383 .282283 145.12 0.000 40.41045 41.51721 _IcatXca~0_2 | 47.12819 .30133 156.40 0.000 46.53746 47.71891 _IcatXca~0_3 | 56.04868 .2869555 195.32 0.000 55.48614 56.61123 _IcatXca~1_0 | 37.95186 .60703 62.52 0.000 36.76185 39.14188 _IcatXca~1_1 | 42.02687 .292895 143.49 0.000 41.45268 42.60105 _IcatXca~1_2 | 48.02988 .2652007 181.11 0.000 47.50998 48.54977 _IcatXca~1_3 | 57.48788 .2041227 281.63 0.000 57.08772 57.88804 _IcatXca~2_0 | 39.71886 .7574869 52.44 0.000 38.2339 41.20382 _IcatXca~2_1 | 42.76198 .318374 134.31 0.000 42.13785 43.38612 _IcatXca~2_2 | 49.20768 .2701078 182.18 0.000 48.67817 49.7372 _IcatXca~2_3 | 58.83387 .1670966 352.09 0.000 58.50629 59.16144 ------------------------------------------------------------------------------
The above model (model A from Table 12.3) can also be estimated using the factor variable syntax introduced in Stata 11. Note again that all of the groups are included and the intercept (constant) is omitted.
regress math12 catholic ibn.catfaminc8#ibn.catmath8, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 13, 5658) =32141.38 Model | 15087598.6 13 1160584.51 Prob > F = 0.0000 Residual | 204303.204 5658 36.1087317 R-squared = 0.9866 -------------+------------------------------ Adj R-squared = 0.9866 Total | 15291901.8 5671 2696.50887 Root MSE = 6.0091 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.328632 .2639492 5.03 0.000 .8111899 1.846073 | catfaminc8#| catmath8 | 0 0 | 36.83616 .5025057 73.30 0.000 35.85106 37.82127 0 1 | 40.96383 .282283 145.12 0.000 40.41045 41.51721 0 2 | 47.12819 .30133 156.40 0.000 46.53746 47.71891 0 3 | 56.04868 .2869555 195.32 0.000 55.48614 56.61123 1 0 | 37.95186 .60703 62.52 0.000 36.76185 39.14188 1 1 | 42.02687 .292895 143.49 0.000 41.45268 42.60105 1 2 | 48.02988 .2652007 181.11 0.000 47.50998 48.54977 1 3 | 57.48788 .2041227 281.63 0.000 57.08772 57.88804 2 0 | 39.71886 .7574869 52.44 0.000 38.2339 41.20382 2 1 | 42.76198 .318374 134.31 0.000 42.13785 43.38612 2 2 | 49.20768 .2701078 182.18 0.000 48.67817 49.7372 2 3 | 58.83387 .1670966 352.09 0.000 58.50629 59.16144 ------------------------------------------------------------------------------
Table 12.3 on page 306, the Linear Main Effects, Two-way Interaction model.
logit catholic inc8 math8 mathfam Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1840.7214 Iteration 2: log likelihood = -1837.6029 Iteration 3: log likelihood = -1837.5922 Iteration 4: log likelihood = -1837.5922 Logistic regression Number of obs = 5671 LR chi2(3) = 120.13 Prob > chi2 = 0.0000 Log likelihood = -1837.5922 Pseudo R2 = 0.0317 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0618026 .0140542 4.40 0.000 .0342569 .0893482 math8 | .0429594 .011135 3.86 0.000 .0211352 .0647836 mathfam | -.000734 .0002615 -2.81 0.005 -.0012466 -.0002214 _cons | -5.208846 .5863848 -8.88 0.000 -6.358139 -4.059553 ------------------------------------------------------------------------------
* Recode faminc8 so that the values are actual mid-values of income in $1000: recode faminc8 (1=0) (2=.5) (3=2) (4=4) (5=6.25) (6=8.75) /// (7=12.5) (8=17.5) (9=22.5) (10=30) (11=42.5) (12=62.5), gen(inc8) (5586 differences between faminc8 and inc8) gen mathfam = math8*inc8 regress math12 inc8 math8 mathfam catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 4, 5666) = 3259.30 Model | 356877.886 4 89219.4715 Prob > F = 0.0000 Residual | 155099.857 5666 27.3737834 R-squared = 0.6971 -------------+------------------------------ Adj R-squared = 0.6968 Total | 511977.743 5670 90.2958982 Root MSE = 5.232 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .1638722 .0218124 7.51 0.000 .1211115 .2066329 math8 | .8721913 .0160066 54.49 0.000 .8408123 .9035703 mathfam | -.002435 .0004171 -5.84 0.000 -.0032527 -.0016173 catholic | 1.658869 .2295556 7.23 0.000 1.208852 2.108886 _cons | 4.827092 .8004556 6.03 0.000 3.257892 6.396291 ------------------------------------------------------------------------------
The above model (model B from Table 12.3) can also be estimated using the factor variable syntax introduced in Stata 11. Note that it is still necessary to recode faminc8 into inc8, but it is not necessary to create the interaction term.
regress math12 c.inc8##c.math8 catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 4, 5666) = 3259.30 Model | 356877.886 4 89219.4715 Prob > F = 0.0000 Residual | 155099.857 5666 27.3737834 R-squared = 0.6971 -------------+------------------------------ Adj R-squared = 0.6968 Total | 511977.743 5670 90.2958982 Root MSE = 5.232 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .1638722 .0218124 7.51 0.000 .1211115 .2066329 math8 | .8721913 .0160066 54.49 0.000 .8408123 .9035703 | c.inc8#| c.math8 | -.002435 .0004171 -5.84 0.000 -.0032527 -.0016173 | catholic | 1.658869 .2295556 7.23 0.000 1.208852 2.108886 _cons | 4.827092 .8004556 6.03 0.000 3.257892 6.396291 ------------------------------------------------------------------------------
Table 12.4, Model A: Initial specification, with linear main effect of inc8, on page 312.
logit catholic inc8 math8 mathfam Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1840.7214 Iteration 2: log likelihood = -1837.6029 Iteration 3: log likelihood = -1837.5922 Iteration 4: log likelihood = -1837.5922 Logistic regression Number of obs = 5671 LR chi2(3) = 120.13 Prob > chi2 = 0.0000 Log likelihood = -1837.5922 Pseudo R2 = 0.0317 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0618026 .0140542 4.40 0.000 .0342569 .0893482 math8 | .0429594 .011135 3.86 0.000 .0211352 .0647836 mathfam | -.000734 .0002615 -2.81 0.005 -.0012466 -.0002214 _cons | -5.208846 .5863848 -8.88 0.000 -6.358139 -4.059553 ------------------------------------------------------------------------------
Table 12.4, Model B: Final specification, with quadratic main effect of inc8, on page 312.
gen inc8sq = inc8*inc8 logit catholic inc8 math8 mathfam inc8sq Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1838.7904 Iteration 2: log likelihood = -1833.5513 Iteration 3: log likelihood = -1833.5413 Iteration 4: log likelihood = -1833.5413 Logistic regression Number of obs = 5671 LR chi2(4) = 128.23 Prob > chi2 = 0.0000 Log likelihood = -1833.5413 Pseudo R2 = 0.0338 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0869049 .017354 5.01 0.000 .0528918 .120918 math8 | .0355965 .0119779 2.97 0.003 .0121202 .0590728 mathfam | -.0005647 .0002821 -2.00 0.045 -.0011175 -.0000119 inc8sq | -.0004382 .0001569 -2.79 0.005 -.0007458 -.0001306 _cons | -5.362148 .6190447 -8.66 0.000 -6.575453 -4.148842 ------------------------------------------------------------------------------ predict p (option pr assumed; Pr(catholic))
Model B from Table 12.4 on page 312 can also be estimated using the factor variable syntax introduced in Stata 11. Note that it is not necessary to create the squared term before running this model.
logit catholic inc8 math8 mathfam c.inc8#c.inc8 Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1838.7904 Iteration 2: log likelihood = -1833.5513 Iteration 3: log likelihood = -1833.5413 Iteration 4: log likelihood = -1833.5413 Logistic regression Number of obs = 5671 LR chi2(4) = 128.23 Prob > chi2 = 0.0000 Log likelihood = -1833.5413 Pseudo R2 = 0.0338 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0869049 .017354 5.01 0.000 .0528918 .120918 math8 | .0355965 .0119779 2.97 0.003 .0121202 .0590728 mathfam | -.0005647 .0002821 -2.00 0.045 -.0011175 -.0000119 | c.inc8#| c.inc8 | -.0004382 .0001569 -2.79 0.005 -.0007458 -.0001306 | _cons | -5.362148 .6190447 -8.66 0.000 -6.575453 -4.148842 ------------------------------------------------------------------------------ predict p (option pr assumed; Pr(catholic))
Detailed summary statistics for the propensity score variable p. (Not shown in text.)
sum p, detail Pr(catholic) ------------------------------------------------------------- Percentiles Smallest 1% .0208345 .0164257 5% .0320812 .016906 10% .0408222 .0170297 Obs 5671 25% .0672965 .017208 Sum of Wgt. 5671 50% .1056115 Mean .1043908 Largest Std. Dev. .0440799 75% .142168 .1729462 90% .1643515 .1729462 Variance .001943 95% .1647264 .1729462 Skewness -.1636008 99% .1652305 .1729462 Kurtosis 1.83253
Figure 12.2, Panel A: Full Sample, shown on page 315.
histogram p, kdensity kdenopts(gaussian) xlabel(0(.1).2) /// ytitle(Frequency) xtitle(Estimated Propensity Scores)
Summary statistics for the propesnity score variable p, by catholic.
by catholic, sort: sum p, detail ------------------------------------------------------------------------------------------- -> catholic = no Pr(catholic) ------------------------------------------------------------- Percentiles Smallest 1% .0204874 .0164257 5% .0304734 .016906 10% .0398502 .0170297 Obs 5079 25% .0643826 .017208 Sum of Wgt. 5079 50% .1018312 Mean .1022535 Largest Std. Dev. .0442736 75% .1395716 .1729462 90% .1642913 .1729462 Variance .0019602 95% .1647052 .1729462 Skewness -.1037761 99% .1652047 .1729462 Kurtosis 1.814756 ------------------------------------------------------------------------------------------- -> catholic = yes Pr(catholic) ------------------------------------------------------------- Percentiles Smallest 1% .0311486 .0221945 5% .0498571 .0255137 10% .066539 .0260665 Obs 592 25% .0935338 .0266655 Sum of Wgt. 592 50% .1307598 Mean .122727 Largest Std. Dev. .0377261 75% .1636715 .1654167 90% .1645418 .1659938 Variance .0014233 95% .1648288 .1668626 Skewness -.6233737 99% .1652885 .1729462 Kurtosis 2.378922
Figure 12.2, Panel B: By catholic, shown on page 315.
histogram p, kdensity kdenopts(gaussian) by(catholic, cols(1) legend(off)) /// xlabel(0(.1).2) ytitle(Frequency) xtitle(Estimated Propensity Scores)
Stratifying on propensity scores, discussed on pages 316-317. This uses the same set of variables as Model A from Table 12.4. Note that pscore is a user-written command, and must be downloaded prior to use, for more information see our FAQ page How do I use search to search for programs and additional help?. (Not shown in text.)
pscore catholic inc8 math8 mathfam, logit pscore(p) blockid(b) numblo(5) **************************************************** Algorithm to estimate the propensity score **************************************************** The treatment is catholic attended | catholic | hs? | Freq. Percent Cum. ------------+----------------------------------- no | 5,079 89.56 89.56 yes | 592 10.44 100.00 ------------+----------------------------------- Total | 5,671 100.00 Estimation of the propensity score Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1840.7214 Iteration 2: log likelihood = -1837.6047 Iteration 3: log likelihood = -1837.5922 Iteration 4: log likelihood = -1837.5922 Logistic regression Number of obs = 5671 LR chi2(3) = 120.13 Prob > chi2 = 0.0000 Log likelihood = -1837.5922 Pseudo R2 = 0.0317 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0618026 .0140542 4.40 0.000 .0342569 .0893482 math8 | .0429594 .011135 3.86 0.000 .0211352 .0647836 mathfam | -.000734 .0002615 -2.81 0.005 -.0012466 -.0002214 _cons | -5.208846 .5863848 -8.88 0.000 -6.358139 -4.059553 ------------------------------------------------------------------------------ Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0300386 .0241574 5% .0400463 .0250049 10% .0488683 .0252239 Obs 5671 25% .0700201 .02554 Sum of Wgt. 5671 50% .1023014 Mean .1043908 Largest Std. Dev. .0442227 75% .1299765 .1898257 90% .1795338 .1898437 Variance .0019556 95% .1835134 .1899649 Skewness .3693322 99% .187957 .1900232 Kurtosis 2.215181 ****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ****************************************************** The final number of blocks is 4 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks ********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** Variable inc8 is not balanced in block 4 Variable mathfam is not balanced in block 4 The balancing property is not satisfied Try a different specification of the propensity score Inferior | of block | attended catholic hs? of pscore | no yes | Total -----------+----------------------+---------- 0 | 588 18 | 606 .05 | 1,002 56 | 1,058 .075 | 1,010 113 | 1,123 .1 | 2,479 405 | 2,884 -----------+----------------------+---------- Total | 5,079 592 | 5,671 ******************************************* End of the algorithm to estimate the pscore *******************************************
Estimate the propensity score blocks shown in Table 12.5 on page 318. (Output not shown in text.)
* drop propensity score variables if they already exist drop p b pscore catholic inc8 inc8sq math8 mathfam, logit pscore(p) blockid(b) numblo(5) **************************************************** Algorithm to estimate the propensity score **************************************************** The treatment is catholic attended | catholic | hs? | Freq. Percent Cum. ------------+----------------------------------- no | 5,079 89.56 89.56 yes | 592 10.44 100.00 ------------+----------------------------------- Total | 5,671 100.00 Estimation of the propensity score Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1838.7904 Iteration 2: log likelihood = -1833.6223 Iteration 3: log likelihood = -1833.5413 Iteration 4: log likelihood = -1833.5413 Logistic regression Number of obs = 5671 LR chi2(4) = 128.23 Prob > chi2 = 0.0000 Log likelihood = -1833.5413 Pseudo R2 = 0.0338 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0869049 .017354 5.01 0.000 .0528918 .120918 inc8sq | -.0004382 .0001569 -2.79 0.005 -.0007458 -.0001306 math8 | .0355965 .0119779 2.97 0.003 .0121202 .0590728 mathfam | -.0005647 .0002821 -2.00 0.045 -.0011175 -.0000119 _cons | -5.362148 .6190447 -8.66 0.000 -6.575453 -4.148842 ------------------------------------------------------------------------------ Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0208345 .0164257 5% .0320812 .016906 10% .0408223 .0170297 Obs 5671 25% .0672965 .017208 Sum of Wgt. 5671 50% .1056115 Mean .1043908 Largest Std. Dev. .0440799 75% .142168 .1729462 90% .1643515 .1729462 Variance .001943 95% .1647264 .1729462 Skewness -.1636008 99% .1652305 .1729462 Kurtosis 1.83253 ****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ****************************************************** The final number of blocks is 6 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks ********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** The balancing property is satisfied This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | attended catholic hs? of pscore | no yes | Total -----------+----------------------+---------- 0 | 810 31 | 841 .05 | 741 45 | 786 .075 | 928 100 | 1,028 .1 | 786 87 | 873 .125 | 810 145 | 955 .15 | 1,004 184 | 1,188 -----------+----------------------+---------- Total | 5,079 592 | 5,671 ******************************************* End of the algorithm to estimate the pscore *******************************************
Variable means by block from Table 12.5 on page 318. Note that for Block 3, the average mathematics achievement for catholic students is listed as 49.63 in the book, but is 51.56 in the table below. Based on communication with the authors, this appears to be a typographic error in the book.
table b catholic, contents(freq mean p mean inc8 mean math8 mean math12) -------------------------------- Number of |attended catholic hs? block | no yes ----------+--------------------- 1 | 810 31 | .03562671 .0397066 | 8.466666 9.814516 | 43.16351 44.67839 | 42.74021 45.34968 | 2 | 741 45 | .06206016 .06352629 | 18.13968 17.52778 | 47.44714 49.45711 | 47.14545 50.21756 | 3 | 928 100 | .0875975 .08860363 | 26.64197 26.565 | 48.80288 49.6273 | 48.79251 51.56 | 4 | 786 87 | .1138969 .11401803 | 33.34605 33.36207 | 52.61875 52.9077 | 52.02316 54.26402 | 5 | 810 145 | .13605543 .13692428 | 40.72839 41.46552 | 55.15296 54.78959 | 54.71558 56.54048 | 6 | 1,004 184 | .16283171 .16266777 | 57.33815 58.36956 | 58.55379 57.85957 | 56.95275 57.3175 --------------------------------
Tests for differences in academic achievement by catholic, in each block, shown in Table 12.5 on page 318. Note that the sign of the differences are reversed, but the magnitude is the same, and that the error in the mean for block 3 discussed above persists.
by b, sort: ttest math12, by(catholic) ------------------------------------------------------------------------------------------- -> b = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 810 42.74021 .2449484 6.971353 42.2594 43.22102 yes | 31 45.34968 1.310109 7.294381 42.67408 48.02528 ---------+-------------------------------------------------------------------- combined | 841 42.8364 .2412525 6.996321 42.36287 43.30993 ---------+-------------------------------------------------------------------- diff | -2.609468 1.277988 -5.117896 -.1010391 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.0419 Ho: diff = 0 degrees of freedom = 839 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0207 Pr(|T| > |t|) = 0.0415 Pr(T > t) = 0.9793 ------------------------------------------------------------------------------------------- -> b = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 741 47.14545 .2882466 7.846452 46.57957 47.71133 yes | 45 50.21756 1.136082 7.621067 47.92793 52.50718 ---------+-------------------------------------------------------------------- combined | 786 47.32134 .2804101 7.86149 46.77089 47.87178 ---------+-------------------------------------------------------------------- diff | -3.072103 1.202757 -5.43311 -.7110971 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.5542 Ho: diff = 0 degrees of freedom = 784 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0054 Pr(|T| > |t|) = 0.0108 Pr(T > t) = 0.9946 ------------------------------------------------------------------------------------------- -> b = 3 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 928 48.79251 .2754558 8.391235 48.25192 49.3331 yes | 100 51.56 .8071014 8.071014 49.95854 53.16146 ---------+-------------------------------------------------------------------- combined | 1028 49.06172 .2618947 8.396983 48.54781 49.57563 ---------+-------------------------------------------------------------------- diff | -2.767489 .8799826 -4.49426 -1.040718 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -3.1449 Ho: diff = 0 degrees of freedom = 1026 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0009 Pr(|T| > |t|) = 0.0017 Pr(T > t) = 0.9991 ------------------------------------------------------------------------------------------- -> b = 4 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 786 52.02316 .3402795 9.539971 51.35519 52.69112 yes | 87 54.26402 .9397039 8.764975 52.39595 56.13209 ---------+-------------------------------------------------------------------- combined | 873 52.24647 .3210069 9.484653 51.61644 52.87651 ---------+-------------------------------------------------------------------- diff | -2.240868 1.069585 -4.340133 -.1416024 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.0951 Ho: diff = 0 degrees of freedom = 871 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0182 Pr(|T| > |t|) = 0.0365 Pr(T > t) = 0.9818 ------------------------------------------------------------------------------------------- -> b = 5 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 810 54.71558 .2588964 7.368319 54.20739 55.22377 yes | 145 56.54048 .5606502 6.751122 55.43232 57.64865 ---------+-------------------------------------------------------------------- combined | 955 54.99266 .2363535 7.30405 54.52883 55.45649 ---------+-------------------------------------------------------------------- diff | -1.824902 .6563147 -3.112891 -.5369135 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.7805 Ho: diff = 0 degrees of freedom = 953 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0028 Pr(|T| > |t|) = 0.0055 Pr(T > t) = 0.9972 ------------------------------------------------------------------------------------------- -> b = 6 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1004 56.95275 .2789432 8.838582 56.40537 57.50013 yes | 184 57.3175 .6020763 8.166961 56.1296 58.5054 ---------+-------------------------------------------------------------------- combined | 1188 57.00924 .2534465 8.735635 56.51199 57.5065 ---------+-------------------------------------------------------------------- diff | -.3647511 .7007456 -1.73959 1.010088 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -0.5205 Ho: diff = 0 degrees of freedom = 1186 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3014 Pr(|T| > |t|) = 0.6028 Pr(T > t) = 0.6986
Weighted average ATT shown in Table 12.5 on page 318. Note that atts is part of the same user-written package as pscore and that the set seed command was used so that the results of the bootstrap can be replicated.
set seed 53156 atts math12 catholic, pscore(p) blockid(b) bootstrap ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------- n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------- 592 5079 1.727 0.347 4.975 --------------------------------------------------------- Bootstrapping of standard errors command: atts math12 catholic , pscore(p) blockid(b) statistic: atts = r(atts) Bootstrap statistics Number of obs = 5671 Replications = 50 ------------------------------------------------------------------------------ Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- atts | 50 1.72731 -.044933 .3138169 1.096672 2.357949 (N) | 1.135532 2.304237 (P) | 1.273374 2.393047 (BC) ------------------------------------------------------------------------------ Note: N = normal P = percentile BC = bias-corrected ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------- n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------- 592 5079 1.727 0.314 5.504 ---------------------------------------------------------
The following few examples demonstrate difference methods of analyzing the same data, treating the propensity scores as an optimal composite covariate.
Method A: Controlling for block by estimating the relationship between math12 and catholic separately in each block.
sort b statsby _b[catholic] e(N), by(b) noisily sav(CathSlopes3,replace): regress math12 catholic statsby: First call to regress with data as is: . regress math12 catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 1, 5669) = 90.48 Model | 8043.1077 1 8043.1077 Prob > F = 0.0000 Residual | 503934.635 5669 88.8930385 R-squared = 0.0157 -------------+------------------------------ Adj R-squared = 0.0155 Total | 511977.743 5670 90.2958982 Root MSE = 9.4283 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.89486 .4094621 9.51 0.000 3.092157 4.697562 _cons | 50.64465 .1322954 382.81 0.000 50.3853 50.904 ------------------------------------------------------------------------------ statsby legend: command: regress math12 catholic _stat_1: _b[catholic] _stat_2: e(N) by: b Statsby groups running (regress math12 catholic) on group 1 . regress math12 catholic Source | SS df MS Number of obs = 841 -------------+------------------------------ F( 1, 839) = 4.17 Model | 203.308037 1 203.308037 Prob > F = 0.0415 Residual | 40913.4431 839 48.7645329 R-squared = 0.0049 -------------+------------------------------ Adj R-squared = 0.0038 Total | 41116.7512 840 48.9485133 Root MSE = 6.9832 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.609468 1.277988 2.04 0.041 .1010391 5.117896 _cons | 42.74021 .2453633 174.19 0.000 42.25861 43.22181 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 2 . regress math12 catholic Source | SS df MS Number of obs = 786 -------------+------------------------------ F( 1, 784) = 6.52 Model | 400.386878 1 400.386878 Prob > F = 0.0108 Residual | 48114.9879 784 61.3711581 R-squared = 0.0083 -------------+------------------------------ Adj R-squared = 0.0070 Total | 48515.3748 785 61.8030252 Root MSE = 7.834 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.072103 1.202757 2.55 0.011 .7110971 5.43311 _cons | 47.14545 .2877882 163.82 0.000 46.58053 47.71038 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 3 . regress math12 catholic Source | SS df MS Number of obs = 1028 -------------+------------------------------ F( 1, 1026) = 9.89 Model | 691.395779 1 691.395779 Prob > F = 0.0017 Residual | 71721.6733 1026 69.904165 R-squared = 0.0095 -------------+------------------------------ Adj R-squared = 0.0086 Total | 72413.069 1027 70.5093175 Root MSE = 8.3609 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.767489 .8799826 3.14 0.002 1.040718 4.49426 _cons | 48.79251 .274459 177.78 0.000 48.25395 49.33108 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 4 . regress math12 catholic Source | SS df MS Number of obs = 873 -------------+------------------------------ F( 1, 871) = 4.39 Model | 393.332574 1 393.332574 Prob > F = 0.0365 Residual | 78050.6082 871 89.6103423 R-squared = 0.0050 -------------+------------------------------ Adj R-squared = 0.0039 Total | 78443.9407 872 89.9586476 Root MSE = 9.4663 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.240868 1.069585 2.10 0.036 .1416024 4.340133 _cons | 52.02316 .3376508 154.07 0.000 51.36045 52.68586 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 5 . regress math12 catholic Source | SS df MS Number of obs = 955 -------------+------------------------------ F( 1, 953) = 7.73 Model | 409.57076 1 409.57076 Prob > F = 0.0055 Residual | 50485.5136 953 52.9753553 R-squared = 0.0080 -------------+------------------------------ Adj R-squared = 0.0070 Total | 50895.0844 954 53.3491451 Root MSE = 7.2784 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.824902 .6563147 2.78 0.006 .5369135 3.112891 _cons | 54.71558 .2557375 213.95 0.000 54.21371 55.21745 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 6 . regress math12 catholic Source | SS df MS Number of obs = 1188 -------------+------------------------------ F( 1, 1186) = 0.27 Model | 20.6884702 1 20.6884702 Prob > F = 0.6028 Residual | 90560.8493 1186 76.3582203 R-squared = 0.0002 -------------+------------------------------ Adj R-squared = -0.0006 Total | 90581.5377 1187 76.3113208 Root MSE = 8.7383 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | .3647511 .7007456 0.52 0.603 -1.010088 1.73959 _cons | 56.95275 .2757789 206.52 0.000 56.41168 57.49382 ------------------------------------------------------------------------------
Distribution of coefficients for catholic (predicting math12) across the propensity score blocks. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. (Not shown in text.)
preserve use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/CathSlopes3, clear list histogram _stat_1, bin(3) frequency kdensity kdenopts(gaussian) restore +------------------------+ | b _stat_1 _stat_2 | |------------------------| 1. | 1 2.609468 841 | 2. | 2 3.072104 786 | 3. | 3 2.767489 1028 | 4. | 4 2.240868 873 | 5. | 5 1.824902 955 | |------------------------| 6. | 6 .3647511 1188 | +------------------------+
Method B: Estimate the relationship between math12 and catholic in all blocks at the same time, using fixed effects the blocks. Note that this model includes the intercept and dummy variables for blocks 2 to 6. (Not shown in text.)
xi: regress math12 catholic i.b i.b _Ib_1-6 (naturally coded; _Ib_1 omitted) Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 6, 5664) = 326.67 Model | 131623.108 6 21937.1846 Prob > F = 0.0000 Residual | 380354.635 5664 67.1530076 R-squared = 0.2571 -------------+------------------------------ Adj R-squared = 0.2563 Total | 511977.743 5670 90.2958982 Root MSE = 8.1947 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.761271 .3595793 4.90 0.000 1.056358 2.466184 _Ib_2 | 4.449025 .4066192 10.94 0.000 3.651895 5.246154 _Ib_3 | 6.118917 .3816345 16.03 0.000 5.370767 6.867066 _Ib_4 | 9.299475 .3965866 23.45 0.000 8.522013 10.07694 _Ib_5 | 11.95377 .3897119 30.67 0.000 11.18978 12.71775 _Ib_6 | 13.96498 .3717204 37.57 0.000 13.23626 14.69369 _cons | 42.77148 .2828863 151.20 0.000 42.21691 43.32604 ------------------------------------------------------------------------------
This model can also be estimated using the factor variable syntax introduced in Stata 11. (Not shown in text.)
regress math12 catholic i.b Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 6, 5664) = 326.67 Model | 131623.108 6 21937.1846 Prob > F = 0.0000 Residual | 380354.635 5664 67.1530076 R-squared = 0.2571 -------------+------------------------------ Adj R-squared = 0.2563 Total | 511977.743 5670 90.2958982 Root MSE = 8.1947 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.761271 .3595793 4.90 0.000 1.056358 2.466184 | b | 2 | 4.449025 .4066192 10.94 0.000 3.651895 5.246154 3 | 6.118917 .3816345 16.03 0.000 5.370767 6.867066 4 | 9.299475 .3965866 23.45 0.000 8.522013 10.07694 5 | 11.95377 .3897119 30.67 0.000 11.18978 12.71775 6 | 13.96498 .3717204 37.57 0.000 13.23626 14.69369 | _cons | 42.77148 .2828863 151.20 0.000 42.21691 43.32604 ------------------------------------------------------------------------------
An equivalent model with no intercept and a fixed effect for each block. (Not shown in text.)
xi i.b, noomit regress math12 catholic _Ib_1-_Ib_6, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 7, 5664) =31721.90 Model | 14911547.1 7 2130221.02 Prob > F = 0.0000 Residual | 380354.635 5664 67.1530076 R-squared = 0.9751 -------------+------------------------------ Adj R-squared = 0.9751 Total | 15291901.8 5671 2696.50887 Root MSE = 8.1947 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.761271 .3595793 4.90 0.000 1.056358 2.466184 _Ib_1 | 42.77148 .2828863 151.20 0.000 42.21691 43.32604 _Ib_2 | 47.2205 .2930191 161.15 0.000 46.64607 47.79493 _Ib_3 | 48.89039 .2579679 189.52 0.000 48.38468 49.39611 _Ib_4 | 52.07095 .2796537 186.20 0.000 51.52272 52.61918 _Ib_5 | 54.72524 .270736 202.14 0.000 54.1945 55.25599 _Ib_6 | 56.73645 .2441879 232.35 0.000 56.25775 57.21515 ------------------------------------------------------------------------------
This model can also be estimated using the factor variable syntax introduced in Stata 11. Note that the dummy variables do not need to be created using xi.
regress math12 catholic ibn.b, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 7, 5664) =31721.90 Model | 14911547.1 7 2130221.02 Prob > F = 0.0000 Residual | 380354.635 5664 67.1530076 R-squared = 0.9751 -------------+------------------------------ Adj R-squared = 0.9751 Total | 15291901.8 5671 2696.50887 Root MSE = 8.1947 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.761271 .3595793 4.90 0.000 1.056358 2.466184 | b | 1 | 42.77148 .2828863 151.20 0.000 42.21691 43.32604 2 | 47.2205 .2930191 161.15 0.000 46.64607 47.79493 3 | 48.89039 .2579679 189.52 0.000 48.38468 49.39611 4 | 52.07095 .2796537 186.20 0.000 51.52272 52.61918 5 | 54.72524 .270736 202.14 0.000 54.1945 55.25599 6 | 56.73645 .2441879 232.35 0.000 56.25775 57.21515 ------------------------------------------------------------------------------
Method C: Controlling for propensities (as a linear effect). (Not shown in text.)
regress math12 catholic p Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 2, 5668) = 996.64 Model | 133204.471 2 66602.2355 Prob > F = 0.0000 Residual | 378773.272 5668 66.8266182 R-squared = 0.2602 -------------+------------------------------ Adj R-squared = 0.2599 Total | 511977.743 5670 90.2958982 Root MSE = 8.1748 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.690306 .3586574 4.71 0.000 .9872 2.393411 p | 107.6782 2.488099 43.28 0.000 102.8006 112.5559 _cons | 39.63417 .2790795 142.02 0.000 39.08707 40.18127 ------------------------------------------------------------------------------
Create propensity score blocks for Table 12.6 on page 320. (Output not shown in text.)
* drop p and b from previous examples drop p b pscore catholic inc8 inc8sq math8 mathfam fhowfar mhowfar fight8 nohw8 /// disrupt8 riskdrop8, logit pscore(p) blockid(b) numblo(10) **************************************************** Algorithm to estimate the propensity score **************************************************** The treatment is catholic attended | catholic | hs? | Freq. Percent Cum. ------------+----------------------------------- no | 5,079 89.56 89.56 yes | 592 10.44 100.00 ------------+----------------------------------- Total | 5,671 100.00 Estimation of the propensity score Iteration 0: log likelihood = -1897.6568 Iteration 1: log likelihood = -1814.3485 Iteration 2: log likelihood = -1804.4342 Iteration 3: log likelihood = -1804.1259 Iteration 4: log likelihood = -1804.1254 Logistic regression Number of obs = 5671 LR chi2(10) = 187.06 Prob > chi2 = 0.0000 Log likelihood = -1804.1254 Pseudo R2 = 0.0493 ------------------------------------------------------------------------------ catholic | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc8 | .0544244 .0190915 2.85 0.004 .0170058 .091843 inc8sq | -.0001894 .0001732 -1.09 0.274 -.000529 .0001501 math8 | .0215572 .0123655 1.74 0.081 -.0026787 .0457932 mathfam | -.0004537 .0002873 -1.58 0.114 -.0010169 .0001095 fhowfar | .1963326 .0866025 2.27 0.023 .0265949 .3660703 mhowfar | .0256765 .086921 0.30 0.768 -.1446855 .1960384 fight8 | -.4742975 .3246254 -1.46 0.144 -1.110552 .1619566 nohw8 | -.6880268 .1760058 -3.91 0.000 -1.032992 -.3430618 disrupt8 | .6927506 .3858711 1.80 0.073 -.0635429 1.449044 riskdrop8 | -.3033031 .0843134 -3.60 0.000 -.4685543 -.1380518 _cons | -4.981792 .703233 -7.08 0.000 -6.360104 -3.603481 ------------------------------------------------------------------------------ Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0108842 .0025956 5% .0204415 .0032377 10% .0313083 .0036331 Obs 5671 25% .0595311 .0047923 Sum of Wgt. 5671 50% .1072919 Mean .1043908 Largest Std. Dev. .0530622 75% .1453336 .2551539 90% .1744188 .2571622 Variance .0028156 95% .1858031 .2619453 Skewness .0612084 99% .2138622 .262511 Kurtosis 2.084156 ****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ****************************************************** The final number of blocks is 5 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks ********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** The balancing property is satisfied This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | attended catholic hs? of pscore | no yes | Total -----------+----------------------+---------- 0 | 1,089 34 | 1,123 .05 | 1,431 110 | 1,541 .1 | 1,599 253 | 1,852 .15 | 829 160 | 989 .2 | 131 35 | 166 -----------+----------------------+---------- Total | 5,079 592 | 5,671 ******************************************* End of the algorithm to estimate the pscore *******************************************
Distribution of estimated propensity scores based on the model for Table 12.6. (Not shown in text.)
sum p, detail Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0108842 .0025956 5% .0204415 .0032377 10% .0313083 .0036331 Obs 5671 25% .0595311 .0047923 Sum of Wgt. 5671 50% .1072919 Mean .1043908 Largest Std. Dev. .0530622 75% .1453336 .2551539 90% .1744188 .2571622 Variance .0028156 95% .1858031 .2619453 Skewness .0612084 99% .2138622 .262511 Kurtosis 2.084156 histogram p, kdensity kdenopts(gaussian) xlabel(0(.1).3) /// ytitle(Frequency) xtitle(Estimated Propensity Scores)
Distribution of propensity scores by catholic.
by catholic, sort: sum p, detail ------------------------------------------------------------------------------------------- -> catholic = no Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0104881 .0025956 5% .0195553 .0032377 10% .0297857 .0036331 Obs 5079 25% .0565767 .0047923 Sum of Wgt. 5079 50% .1008954 Mean .1012761 Largest Std. Dev. .0529831 75% .1440137 .2551539 90% .173256 .2571622 Variance .0028072 95% .1824525 .2619453 Skewness .1216262 99% .2133106 .262511 Kurtosis 2.085038 ------------------------------------------------------------------------------------------- -> catholic = yes Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0229674 .0099903 5% .0470903 .011172 10% .0699177 .018644 Obs 592 25% .1003173 .0196516 Sum of Wgt. 592 50% .1355502 Mean .1311124 Largest Std. Dev. .0457582 75% .169457 .2266057 90% .1842547 .228063 Variance .0020938 95% .2031689 .2391276 Skewness -.3031523 99% .2224912 .2462637 Kurtosis 2.624687 histogram p, kdensity kdenopts(gaussian) by(catholic, cols(1) legend(off)) /// xlabel(0(.1).3) ytitle(Frequency) xtitle(Estimated Propensity Scores)
Descriptive statistics for Table 12.6 on page 320.
table b catholic, contents(freq mean p mean math12) -------------------------------- Number of |attended catholic hs? block | no yes ----------+--------------------- 1 | 1,089 34 | .03032015 .03453844 | 43.66365 46.01353 | 2 | 1,431 110 | .07522784 .07848568 | 48.85303 51.00237 | 3 | 1,599 253 | .1270797 .12911088 | 53.62299 55.38316 | 4 | 829 160 | .17195554 .17329525 | 56.86899 57.34556 | 5 | 131 35 | .21343681 .21195736 | 52.50557 55.01257 --------------------------------
Tests for differences in academic achievement by catholic, in each block, shown in Table 12.6 on page 318. Note that the sign of the differences are reversed, but the magnitude is the same.
by b, sort: ttest math12, by(catholic) ------------------------------------------------------------------------------------------- -> b = 1 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1089 43.66365 .2303222 7.600632 43.21172 44.11557 yes | 34 46.01353 1.288496 7.513157 43.39206 48.63499 ---------+-------------------------------------------------------------------- combined | 1123 43.73479 .2269499 7.60536 43.2895 44.18008 ---------+-------------------------------------------------------------------- diff | -2.349884 1.323244 -4.946197 .2464296 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -1.7759 Ho: diff = 0 degrees of freedom = 1121 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0380 Pr(|T| > |t|) = 0.0760 Pr(T > t) = 0.9620 ------------------------------------------------------------------------------------------- -> b = 2 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1431 48.85303 .2310661 8.740896 48.39977 49.3063 yes | 110 51.00236 .7979284 8.368744 49.4209 52.58383 ---------+-------------------------------------------------------------------- combined | 1541 49.00646 .2223837 8.729799 48.57025 49.44266 ---------+-------------------------------------------------------------------- diff | -2.149331 .8622945 -3.840727 -.4579344 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.4926 Ho: diff = 0 degrees of freedom = 1539 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0064 Pr(|T| > |t|) = 0.0128 Pr(T > t) = 0.9936 ------------------------------------------------------------------------------------------- -> b = 3 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 1599 53.62299 .2155456 8.61913 53.20021 54.04577 yes | 253 55.38316 .5098506 8.10967 54.37905 56.38727 ---------+-------------------------------------------------------------------- combined | 1852 53.86344 .199154 8.570566 53.47285 54.25403 ---------+-------------------------------------------------------------------- diff | -1.760173 .5786011 -2.894952 -.6253928 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -3.0421 Ho: diff = 0 degrees of freedom = 1850 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0012 Pr(|T| > |t|) = 0.0024 Pr(T > t) = 0.9988 ------------------------------------------------------------------------------------------- -> b = 4 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 829 56.86899 .2909555 8.377295 56.29789 57.44008 yes | 160 57.34556 .5922284 7.491162 56.17591 58.51521 ---------+-------------------------------------------------------------------- combined | 989 56.94609 .2619749 8.238684 56.432 57.46018 ---------+-------------------------------------------------------------------- diff | -.4765759 .7116066 -1.873012 .9198599 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -0.6697 Ho: diff = 0 degrees of freedom = 987 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2516 Pr(|T| > |t|) = 0.5032 Pr(T > t) = 0.7484 ------------------------------------------------------------------------------------------- -> b = 5 Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 131 52.50557 .6960753 7.966946 51.12847 53.88267 yes | 35 55.01257 1.310023 7.750203 52.35028 57.67486 ---------+-------------------------------------------------------------------- combined | 166 53.03416 .6181866 7.964777 51.81358 54.25473 ---------+-------------------------------------------------------------------- diff | -2.506999 1.507463 -5.483536 .4695381 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -1.6631 Ho: diff = 0 degrees of freedom = 164 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0491 Pr(|T| > |t|) = 0.0982 Pr(T > t) = 0.9509
ATT shown in Table 12.6 on page 320.
set seed 7492 atts math12 catholic, pscore(p) blockid(b) bootstrap ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------- n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------- 592 5079 1.564 0.353 4.424 --------------------------------------------------------- Bootstrapping of standard errors command: atts math12 catholic , pscore(p) blockid(b) statistic: atts = r(atts) Bootstrap statistics Number of obs = 5671 Replications = 50 ------------------------------------------------------------------------------ Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- atts | 50 1.563586 .0258251 .3812768 .7973821 2.329791 (N) | .7701139 2.242676 (P) | .605822 2.242676 (BC) ------------------------------------------------------------------------------ Note: N = normal P = percentile BC = bias-corrected ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------- n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------- 592 5079 1.564 0.381 4.101 ---------------------------------------------------------
Additional methods of controlling for propensity scores, using the model with additional covariates used to estimate propensity (i.e. using the propensity model from Table 12.6).
Method A: Controlling for block by estimating the relationship between math12 and catholic separately in each block.
sort b statsby _b[catholic] e(N), by(b) noisily sav(Cathslopes4,replace): regress math12 catholic statsby: First call to regress with data as is: . regress math12 catholic Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 1, 5669) = 90.48 Model | 8043.1077 1 8043.1077 Prob > F = 0.0000 Residual | 503934.635 5669 88.8930385 R-squared = 0.0157 -------------+------------------------------ Adj R-squared = 0.0155 Total | 511977.743 5670 90.2958982 Root MSE = 9.4283 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 3.89486 .4094621 9.51 0.000 3.092157 4.697562 _cons | 50.64465 .1322954 382.81 0.000 50.3853 50.904 ------------------------------------------------------------------------------ statsby legend: command: regress math12 catholic _stat_1: _b[catholic] _stat_2: e(N) by: b Statsby groups running (regress math12 catholic) on group 1 . regress math12 catholic Source | SS df MS Number of obs = 1123 -------------+------------------------------ F( 1, 1121) = 3.15 Model | 182.062217 1 182.062217 Prob > F = 0.0760 Residual | 64716.1076 1121 57.7306936 R-squared = 0.0028 -------------+------------------------------ Adj R-squared = 0.0019 Total | 64898.1698 1122 57.8415061 Root MSE = 7.5981 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.349884 1.323244 1.78 0.076 -.2464296 4.946197 _cons | 43.66365 .2302446 189.64 0.000 43.21189 44.1154 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 2 . regress math12 catholic Source | SS df MS Number of obs = 1541 -------------+------------------------------ F( 1, 1539) = 6.21 Model | 471.884989 1 471.884989 Prob > F = 0.0128 Residual | 116890.58 1539 75.9522939 R-squared = 0.0040 -------------+------------------------------ Adj R-squared = 0.0034 Total | 117362.465 1540 76.2093931 Root MSE = 8.7151 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.149331 .8622945 2.49 0.013 .4579344 3.840727 _cons | 48.85303 .2303831 212.05 0.000 48.40113 49.30493 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 3 . regress math12 catholic Source | SS df MS Number of obs = 1852 -------------+------------------------------ F( 1, 1850) = 9.25 Model | 676.765989 1 676.765989 Prob > F = 0.0024 Residual | 135287.688 1850 73.1284801 R-squared = 0.0050 -------------+------------------------------ Adj R-squared = 0.0044 Total | 135964.454 1851 73.4545944 Root MSE = 8.5515 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.760173 .5786011 3.04 0.002 .6253928 2.894952 _cons | 53.62299 .2138548 250.74 0.000 53.20357 54.04241 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 4 . regress math12 catholic Source | SS df MS Number of obs = 989 -------------+------------------------------ F( 1, 987) = 0.45 Model | 30.4608727 1 30.4608727 Prob > F = 0.5032 Residual | 67030.949 987 67.9138288 R-squared = 0.0005 -------------+------------------------------ Adj R-squared = -0.0006 Total | 67061.4099 988 67.875921 Root MSE = 8.241 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | .4765759 .7116066 0.67 0.503 -.9198599 1.873012 _cons | 56.86899 .2862212 198.69 0.000 56.30731 57.43066 ------------------------------------------------------------------------------ running (regress math12 catholic) on group 5 . regress math12 catholic Source | SS df MS Number of obs = 166 -------------+------------------------------ F( 1, 164) = 2.77 Model | 173.595927 1 173.595927 Prob > F = 0.0982 Residual | 10293.6209 164 62.7659812 R-squared = 0.0166 -------------+------------------------------ Adj R-squared = 0.0106 Total | 10467.2168 165 63.4376779 Root MSE = 7.9225 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 2.506999 1.507463 1.66 0.098 -.4695381 5.483536 _cons | 52.50557 .6921919 75.85 0.000 51.13882 53.87233 ------------------------------------------------------------------------------
Distribution of coefficients for catholic (predicting math12) across the propensity score blocks. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. (Not shown in text.)
preserve use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/CathSlopes4, clear list histogram _stat_1, bin(3) frequency kdensity kdenopts(gaussian) restore +------------------------+ | b _stat_1 _stat_2 | |------------------------| 1. | 1 2.349884 1123 | 2. | 2 2.149331 1541 | 3. | 3 1.760173 1852 | 4. | 4 .4765759 989 | 5. | 5 2.506999 166 | +------------------------+
Method B: Estimate the relationship between math12 and catholic in all blocks at the same time, using fixed effects the blocks. Note that this model includes the intercept and dummy variables for blocks 2 to 6. (Not shown in text.)
xi: regress math12 catholic i.b i.b _Ib_1-5 (naturally coded; _Ib_1 omitted) Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 5, 5665) = 337.52 Model | 117512.027 5 23502.4055 Prob > F = 0.0000 Residual | 394465.715 5665 69.6320768 R-squared = 0.2295 -------------+------------------------------ Adj R-squared = 0.2288 Total | 511977.743 5670 90.2958982 Root MSE = 8.3446 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.580998 .367602 4.30 0.000 .8603572 2.301639 _Ib_2 | 5.206677 .32775 15.89 0.000 4.564162 5.849193 _Ib_3 | 9.960542 .318012 31.32 0.000 9.337117 10.58397 _Ib_4 | 13.00339 .3670815 35.42 0.000 12.28377 13.72301 _Ib_5 | 9.013889 .6970521 12.93 0.000 7.6474 10.38038 _cons | 43.68692 .2492575 175.27 0.000 43.19828 44.17556 ------------------------------------------------------------------------------
This model can also be estimated using the factor variable syntax introduced in Stata 11. (Not shown in text.)
regress math12 catholic i.b Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 5, 5665) = 337.52 Model | 117512.027 5 23502.4055 Prob > F = 0.0000 Residual | 394465.715 5665 69.6320768 R-squared = 0.2295 -------------+------------------------------ Adj R-squared = 0.2288 Total | 511977.743 5670 90.2958982 Root MSE = 8.3446 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.580998 .367602 4.30 0.000 .8603572 2.301639 | b | 2 | 5.206677 .32775 15.89 0.000 4.564162 5.849193 3 | 9.960542 .318012 31.32 0.000 9.337117 10.58397 4 | 13.00339 .3670815 35.42 0.000 12.28377 13.72301 5 | 9.013889 .6970521 12.93 0.000 7.6474 10.38038 | _cons | 43.68692 .2492575 175.27 0.000 43.19828 44.17556 ------------------------------------------------------------------------------
An equivalent model with no intercept and a fixed effect for each block. (Not shown in text.)
xi i.b, noomit regress math12 catholic _Ib_1-_Ib_5, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 6, 5665) =35657.50 Model | 14897436.1 6 2482906.01 Prob > F = 0.0000 Residual | 394465.715 5665 69.6320768 R-squared = 0.9742 -------------+------------------------------ Adj R-squared = 0.9742 Total | 15291901.8 5671 2696.50887 Root MSE = 8.3446 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.580998 .367602 4.30 0.000 .8603572 2.301639 _Ib_1 | 43.68692 .2492575 175.27 0.000 43.19828 44.17556 _Ib_2 | 48.8936 .2141841 228.28 0.000 48.47372 49.31348 _Ib_3 | 53.64747 .2003001 267.84 0.000 53.2548 54.04013 _Ib_4 | 56.69031 .2719252 208.48 0.000 56.15724 57.22339 _Ib_5 | 52.70081 .6522864 80.79 0.000 51.42208 53.97954 ------------------------------------------------------------------------------
This model can also be estimated using the factor variable syntax introduced in Stata 11. Note that the dummy variables do not need to be created using xi.
regress math12 catholic ibn.b, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 6, 5665) =35657.50 Model | 14897436.1 6 2482906.01 Prob > F = 0.0000 Residual | 394465.715 5665 69.6320768 R-squared = 0.9742 -------------+------------------------------ Adj R-squared = 0.9742 Total | 15291901.8 5671 2696.50887 Root MSE = 8.3446 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.580998 .367602 4.30 0.000 .8603572 2.301639 | b | 1 | 43.68692 .2492575 175.27 0.000 43.19828 44.17556 2 | 48.8936 .2141841 228.28 0.000 48.47372 49.31348 3 | 53.64747 .2003001 267.84 0.000 53.2548 54.04013 4 | 56.69031 .2719252 208.48 0.000 56.15724 57.22339 5 | 52.70081 .6522864 80.79 0.000 51.42208 53.97954 ------------------------------------------------------------------------------
Method C: Controlling for propensities (as a linear effect). (Not shown in text.)
regress math12 catholic p Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 2, 5668) = 802.96 Model | 113033.68 2 56516.8401 Prob > F = 0.0000 Residual | 398944.063 5668 70.3853321 R-squared = 0.2208 -------------+------------------------------ Adj R-squared = 0.2205 Total | 511977.743 5670 90.2958982 Root MSE = 8.3896 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.438686 .3698603 3.89 0.000 .7136181 2.163753 p | 82.32179 2.131477 38.62 0.000 78.14328 86.5003 _cons | 42.30742 .2458801 172.07 0.000 41.8254 42.78944 ------------------------------------------------------------------------------
Controlling for selection using nearest-neighbor matching (with random draws). Discussed on page 323. The command attnd is part of the same user-written package as pscore and atts.
attnd math12 catholic, pscore(p) comsup detail matchvar(neighbor) matchdta(pickdat3) id(id) **************************************************************** Estimation of the ATT with the nearest neighbor matching method Random draw version **************************************************************** Note: the common support option has been selected The region of common support is [.00999032, .24626373] The outcome is math12 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- math12 | 5617 51.14784 9.475516 29.88 71.37 The treatment is catholic attended | catholic | hs? | Freq. Percent Cum. ------------+----------------------------------- no | 5,025 89.46 89.46 yes | 592 10.54 100.00 ------------+----------------------------------- Total | 5,617 100.00 The distribution of the pscore is Estimated propensity score ------------------------------------------------------------- Percentiles Smallest 1% .0133107 .0099903 5% .0224748 .0100709 10% .0323934 .0101176 Obs 5617 25% .0604995 .0102946 Sum of Wgt. 5617 50% .107828 Mean .1049281 Largest Std. Dev. .0522626 75% .145359 .2322851 90% .1742154 .2384602 Variance .0027314 95% .1851969 .2391276 Skewness .0475229 99% .2127406 .2462637 Kurtosis 2.030977 The program is searching the nearest neighbor of each treated unit. This operation may take a while. **************************************************** Forward search **************************************************** Backward search **************************************************** Choice between backward or forward match **************************************************** Display of final results **************************************************** The number of treated is 592 The number of treated which have been matched is 592 Average absolute pscore difference between treated and controls Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- PSDIF | 592 .0000432 .0003268 0 .0078036 Average outcome of the matched treated Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- math12 | 592 54.53951 8.463153 32.92 71.08 Average outcome of the matched controls Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- math12 | 553 592 53.61822 8.913623 32.05 70.79 (553 real changes made) (592 real changes made) ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------- n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------- 592 553 0.921 0.537 1.716 --------------------------------------------------------- Note: the numbers of treated and controls refer to actual nearest neighbour matches ***************************************************************************** End of the estimation with the nearest neighbor matching (random draw) method *****************************************************************************
Inspect the neighbors based on the above model and estimate ATE. Note that the syntax shown below is run from a .do file, the block of syntax should be run all at once. The syntax is shown twice, once as a single block, then with the output. (Discussed on page 323.)
* Syntax alone preserve use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/pickdat3, clear merge id using "https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic.dta" , /// unique update sort p catholic list id p catholic faminc8 math8 fhowfar mhowfar fight8 nohw8 disrupt8 riskdrop8 /// if p<.012 & neighbor==1 * Estimate ATE directly ttest math12 if neighbor==1, by(catholic) restore * Syntax with output preserve use https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/pickdat3, clear merge id using "https://stats.idre.ucla.edu/stat/stata/examples/methods_matter/chapter12/catholic.dta" , /// unique update id was float now double sort p catholic list id p catholic faminc8 math8 fhowfar mhowfar fight8 nohw8 disrupt8 riskdrop8 /// if p<.012 & neighbor==1 +----------------------------------------------------------------------------------+ 1. | id | p | catholic | faminc8 | math8 | fhowfar | mhowfar | | 1485802 | .0099903 | yes | $3000-$4999 | 42.02 | coll <4 | postsec ed | |----------------------------------------------------------------------------------| | fight8 | nohw8 | disrupt8 | riskdr~8 | | never | yes | no | 3 | +----------------------------------------------------------------------------------+ +----------------------------------------------------------------------------------+ 2. | id | p | catholic | faminc8 | math8 | fhowfar | mhowfar | | 709436 | .0100709 | no | $3000-$4999 | 52.16 | hs grad | hs grad | |----------------------------------------------------------------------------------| | fight8 | nohw8 | disrupt8 | riskdr~8 | | never | yes | no | 2 | +----------------------------------------------------------------------------------+ +----------------------------------------------------------------------------------+ 14. | id | p | catholic | faminc8 | math8 | fhowfar | mhowfar | | 6873825 | .0111274 | no | $5000-$7499 | 39.05 | postsec ed | postsec ed | |----------------------------------------------------------------------------------| | fight8 | nohw8 | disrupt8 | riskdr~8 | | never | yes | no | 4 | +----------------------------------------------------------------------------------+ +----------------------------------------------------------------------------------+ 15. | id | p | catholic | faminc8 | math8 | fhowfar | mhowfar | | 1485892 | .011172 | yes | $10000-$14999 | 42.36 | hs grad | junior coll | |----------------------------------------------------------------------------------| | fight8 | nohw8 | disrupt8 | riskdr~8 | | never | yes | no | 2 | +----------------------------------------------------------------------------------+ * Estimate ATE directly ttest math12 if neighbor==1, by(catholic) Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- no | 553 53.50092 .3811649 8.963455 52.75221 54.24963 yes | 592 54.53951 .3478334 8.463153 53.85637 55.22265 ---------+-------------------------------------------------------------------- combined | 1145 54.0379 .2577003 8.720023 53.53229 54.54352 ---------+-------------------------------------------------------------------- diff | -1.038588 .5150099 -2.049059 -.0281169 ------------------------------------------------------------------------------ diff = mean(no) - mean(yes) t = -2.0166 Ho: diff = 0 degrees of freedom = 1143 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0220 Pr(|T| > |t|) = 0.0440 Pr(T > t) = 0.9780 restore
Model estimated using inverse propensity score weighting, discussed starting on page 327. Inverse probability weights are calculated based on the propensity scores for the previous model.
gen pscorewgt=1/p replace pscorewgt=1/(1-p) if catholic==0 (5079 real changes made)
Estimate the ATE using the Imbens and Wooldridge method detailed in footnote 29 on page 327. Estimating the ATE "by hand."
gen pmath12=pscorewgt*math12 total pmath12 if catholic==0 Total estimation Number of obs = 5079 -------------------------------------------------------------- | Total Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ pmath12 | 288700 894.6781 286946.1 290454 -------------------------------------------------------------- total pmath12 if catholic==1 Total estimation Number of obs = 592 -------------------------------------------------------------- | Total Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ pmath12 | 297285.7 8240.164 281102.1 313469.3 -------------------------------------------------------------- total pscorewgt if catholic==0 Total estimation Number of obs = 5079 -------------------------------------------------------------- | Total Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ pscorewgt | 5671.273 4.753054 5661.955 5680.591 -------------------------------------------------------------- total pscorewgt if catholic==1 Total estimation Number of obs = 592 -------------------------------------------------------------- | Total Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ pscorewgt | 5675.911 192.6692 5297.512 6054.311 -------------------------------------------------------------- * calculate the ATE display 297285.7/5675.911 - 288700/5671.273 1.4710589
Estimate the ATE using analytic weights. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)
sum math12 if catholic==0 [aw=pscorewgt] Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- math12 | 5079 5671.27272 50.90568 9.525339 29.88 71.37 sum math12 if catholic==1 [aw=pscorewgt] Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- math12 | 592 5675.9113 52.37674 9.03198 32.92 71.08 display 52.37674-50.90568 1.47106
Estimate the ATE using WLS with analytic weights. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)
regress math12 catholic [aw=pscorewgt] (sum of wgt is 1.1347e+04) Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 1, 5669) = 35.63 Model | 3068.00754 1 3068.00754 Prob > F = 0.0000 Residual | 488129.279 5669 86.105006 R-squared = 0.0062 -------------+------------------------------ Adj R-squared = 0.0061 Total | 491197.287 5670 86.6309148 Root MSE = 9.2793 ------------------------------------------------------------------------------ math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- catholic | 1.471053 .2464418 5.97 0.000 .9879331 1.954174 _cons | 50.90568 .1742963 292.06 0.000 50.564 51.24737 ------------------------------------------------------------------------------
Estimate the ATE using WLS by transformation. (Note this is the same Imbens and Wooldridge estimator as above, with a different method of calculation.)
gen w=sqrt(pscorewgt) gen wmath12 = w*math12 gen wcatholic = w*catholic regress wmath12 w wcatholic, noconstant Source | SS df MS Number of obs = 5671 -------------+------------------------------ F( 2, 5669) =87838.97 Model | 30267327.8 2 15133663.9 Prob > F = 0.0000 Residual | 976704.778 5669 172.288724 R-squared = 0.9687 -------------+------------------------------ Adj R-squared = 0.9687 Total | 31244032.6 5671 5509.43971 Root MSE = 13.126 ------------------------------------------------------------------------------ wmath12 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- w | 50.90568 .1742963 292.06 0.000 50.564 51.24737 wcatholic | 1.471053 .2464418 5.97 0.000 .987933 1.954173 ------------------------------------------------------------------------------
Kernel-density of inc8 without weighting. (Note shown in text.)
twoway kdensity inc8 if catholic==1 || /// kdensity inc8 if catholic==0, /// legend(off) scheme(lean1)
Kernel-density of inc8 with inverse propensity weighting. (Note shown in text.)
twoway kdensity inc8 if catholic==1 [aw=pscorewgt] || /// kdensity inc8 if catholic==0 [aw=pscorewgt], /// legend(off) scheme(lean1)
Figure 13.3, Part A on page 327.
twoway kdensity math8 if catholic==1 || /// kdensity math8 if catholic==0, /// legend(off) scheme(lean1)
Figure 13.3, Part B on page 327.
twoway kdensity math8 if catholic==1 [aw=pscorewgt] || /// kdensity math8 if catholic==0 [aw=pscorewgt], /// legend(off) scheme(lean1)