You can download the dataset for this chapter by clicking here: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/angrist.sas7bdat.
Descriptive statistics for the all variables. The variable read is the average reading score for each school, size is the cohort size for each school, intended_classsize is the average intended class size for each school, and observed_classize is the average of actual class sizes for each school. Note: this output does not appear in the text.
proc means data=angrist;
vars read size intended_classize observed_classize;
run;
The MEANS Procedure
Variable Label N Mean Std Dev
read verbal score (class average) 2019 74.3791713 7.6784598
size size of september enrollment cohort 2019 77.7419515 38.8107308
intended_classize maimonides rule 2019 30.9559353 6.1079239
observed_classize spring class size 2019 29.9351164 6.5458852
Variable Label Minimum Maximum
read verbal score (class average) 34.7999992 93.8600006
size size of september enrollment cohort 8.0000000 226.0000000
intended_classize maimonides rule 8.0000000 40.0000000
observed_classize spring class size 8.0000000 44.0000000
A boxplot showing the distribution of read by size. (Not shown in text.)
proc sgplot data=angrist; vbox read / category=size; where size>=36 AND size<=46; run;![]()
Table 9.1 on page 168.
proc sql ;
select size, mean(intended_classize) as mean_intended,
mean(observed_classize) as mean_observed, mean(read) as mean_read,
std(read) as sd_read
from angrist
where size>=36 AND size<=46
group by size;
quit;
size of
september
enrollment mean_ mean_
cohort intended observed mean_read sd_read
36 36 27.44444 67.30445 12.36389
37 37 26.22222 68.94066 8.497514
38 38 33.1 67.854 14.03826
39 39 31.2 68.87 12.07238
40 40 29.88889 67.92847 7.865053
41 20.5 22.67857 73.6767 8.766867
42 21 23.4 67.5956 9.302938
43 21.5 22.125 77.17644 7.466089
44 22 24.41176 72.1616 7.712399
45 22.5 22.73684 76.91684 8.708218
46 23 22.55 70.30814 9.783913
Difference analysis discussed on page 172.
data angrist;
set angrist;
if size<=40 then small=0;
if size>40 then small=1;
run;
proc ttest data=angrist;
class small;
var read;
where size=40 | size = 41;
run;
The TTEST Procedure
Variable: read (verbal score (class average))
small N Mean Std Dev Std Err Minimum Maximum
0 9 67.9285 7.8651 2.6217 52.7700 77.2900
1 28 73.6767 8.7669 1.6568 55.3200 89.2700
Diff (1-2) -5.7482 8.5691 3.2835
small Method Mean 95% CL Mean Std Dev 95% CL Std Dev
0 67.9285 61.8829 73.9741 7.8651 5.3125 15.0676
1 73.6767 70.2773 77.0761 8.7669 6.9313 11.9329
Diff (1-2) Pooled -5.7482 -12.4141 0.9176 8.5691 6.9502 11.1779
Diff (1-2) Satterthwaite -5.7482 -12.3601 0.8637
Method Variances DF t Value Pr > |t|
Pooled Equal 35 -1.75 0.0888
Satterthwaite Unequal 14.959 -1.85 0.0836
Equality of Variances
Method Num DF Den DF F Value Pr > F
Folded F 27 8 1.24 0.7921
Difference of difference analysis, discussed starting on page 173. (Output not shown in text.)
data angrist2;
set angrist;
where size>=38 & size<=41;
if size=38 then group=1;
if size=39 then group=2;
if size=40 then group=3;
if size=41 then group=4;
run;
proc sort data=angrist2;
by group;
run;
proc means data=angrist2 mean var;
class group;
var read;
run;
The MEANS Procedure
Analysis Variable : read verbal score (class average)
N
group Obs Mean Variance
1 10 67.8540001 197.0727013
2 10 68.8699997 145.7422671
3 9 67.9284655 61.8590551
4 28 73.6766950 76.8579504
proc sgplot data=angrist2;
vbox read / category=group;
run;
* "larger" distinguishes the group with the larger enrollment in any pair;
* "first" distinguishes the groups that participate in the first diff;
data angrist2;
set angrist2;
if group=1 | group=3 then larger=0;
if group=2 | group=4 then larger=1;
if group=1 | group=2 then first=0;
if group=3 | group=4 then first=1;
run;
proc glm data=angrist2;
model read = first larger first*larger;
run;
The GLM Procedure
Number of Observations Read 57
Number of Observations Used 57
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 429.340264 143.113421 1.34 0.2709
Error 53 5655.371816 106.705129
Corrected Total 56 6084.712080
R-Square Coeff Var Root MSE read Mean
0.070560 14.56868 10.32982 70.90427
Source DF Type I SS Mean Square F Value Pr > F
first 1 199.1352089 199.1352089 1.87 0.1777
larger 1 165.6365436 165.6365436 1.55 0.2183
first*larger 1 64.5685115 64.5685115 0.61 0.4401
Source DF Type III SS Mean Square F Value Pr > F
first 1 0.02626627 0.02626627 0.00 0.9875
larger 1 5.16127597 5.16127597 0.05 0.8268
first*larger 1 64.56851148 64.56851148 0.61 0.4401
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 67.85400009 3.26657510 20.77 <.0001
first 0.07446543 4.74622358 0.02 0.9875
larger 1.01599960 4.61963480 0.22 0.8268
first*larger 4.73222988 6.08342408 0.78 0.4401
Figure 9.1 on page 176. The dataset has changed so you will need to open the original dataset again.
proc sql; create table angrist3 as select *, mean(read) as mread from angrist group by size; quit; data angrist3; set angrist3; if size=41 then mread41=mread; if size>=36 & size<=40 then mread3640 = mread; run; proc sgplot data=angrist3 noautolegend; series y=mread3640 x=size; scatter y=mread41 x=size ; reg y=mread3640 x=size ; where size>=36 & size<=41; yaxis label='Average Reading Achievement'; xaxis label='Size of Enrollment Cohort'; run;![]()
Table 9.3 on page 180.
data angrist;
set angrist;
small=0 ;
if size>=41 then small=1;
csize = size-41;
run;
proc glm data=angrist;
model read = csize small;
where size>=36 & size<=41;
run;
quit;
The GLM Procedure
Number of Observations Read 75
Number of Observations Used 75
The SAS System 11:04 Wednesday, January 19, 2011 122
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 530.144127 265.072063 2.55 0.0848
Error 72 7473.058357 103.792477
Corrected Total 74 8003.202483
R-Square Coeff Var Root MSE read Mean
0.066241 14.50505 10.18786 70.23666
Source DF Type I SS Mean Square F Value Pr > F
csize 1 360.4818362 360.4818362 3.47 0.0665
small 1 169.6622905 169.6622905 1.63 0.2052
Source DF Type III SS Mean Square F Value Pr > F
csize 1 1.3983446 1.3983446 0.01 0.9079
small 1 169.6622905 169.6622905 1.63 0.2052
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.55656883 3.51152647 19.52 <.0001
csize 0.12397588 1.06810272 0.12 0.9079
small 5.12012617 4.00470877 1.28 0.2052
1044 quit;
Boxplot of read by school size, for classes size 36-46. (Not shown in text.)
proc sgplot data=angrist; vbox read / category=size; where size>=36 & size<=46; run;![]()
Table 9.4 on page 183.
proc glm data=angrist;
model read = csize small;
where size>=36 & size<=46;
run;
quit;
The GLM Procedure
Number of Observations Read 180
Number of Observations Used 180
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 794.77415 397.38708 4.25 0.0158
Error 177 16564.03364 93.58211
Corrected Total 179 17358.80779
R-Square Coeff Var Root MSE read Mean
0.045785 13.49391 9.673785 71.69002
Source DF Type I SS Mean Square F Value Pr > F
csize 1 619.5174098 619.5174098 6.62 0.0109
small 1 175.2567410 175.2567410 1.87 0.1729
Source DF Type III SS Mean Square F Value Pr > F
csize 1 14.3411041 14.3411041 0.15 0.6959
small 1 175.2567410 175.2567410 1.87 0.1729
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.69568689 1.91775834 35.82 <.0001
csize 0.17067980 0.43600075 0.39 0.6959
small 3.84715660 2.81124643 1.37 0.1729
Syntax to produce the coefficients from small and csize shown in Table 9.5 on page 184.
proc glm data=angrist;
model read = csize small;
where size>=36 & size<=41;
run;
The GLM Procedure
Number of Observations Read 75
Number of Observations Used 75
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 530.144127 265.072063 2.55 0.0848
Error 72 7473.058357 103.792477
Corrected Total 74 8003.202483
R-Square Coeff Var Root MSE read Mean
0.066241 14.50505 10.18786 70.23666
Source DF Type I SS Mean Square F Value Pr > F
csize 1 360.4818362 360.4818362 3.47 0.0665
small 1 169.6622905 169.6622905 1.63 0.2052
Source DF Type III SS Mean Square F Value Pr > F
csize 1 1.3983446 1.3983446 0.01 0.9079
small 1 169.6622905 169.6622905 1.63 0.2052
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.55656883 3.51152647 19.52 <.0001
csize 0.12397588 1.06810272 0.12 0.9079
small 5.12012617 4.00470877 1.28 0.2052
proc glm data=angrist;
model read = csize small;
where size>=36 & size<=46;
run;
The GLM Procedure
Number of Observations Read 180
Number of Observations Used 180
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 794.77415 397.38708 4.25 0.0158
Error 177 16564.03364 93.58211
Corrected Total 179 17358.80779
R-Square Coeff Var Root MSE read Mean
0.045785 13.49391 9.673785 71.69002
Source DF Type I SS Mean Square F Value Pr > F
csize 1 619.5174098 619.5174098 6.62 0.0109
small 1 175.2567410 175.2567410 1.87 0.1729
Source DF Type III SS Mean Square F Value Pr > F
csize 1 14.3411041 14.3411041 0.15 0.6959
small 1 175.2567410 175.2567410 1.87 0.1729
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.69568689 1.91775834 35.82 <.0001
csize 0.17067980 0.43600075 0.39 0.6959
small 3.84715660 2.81124643 1.37 0.1729
proc glm data=angrist;
model read = csize small;
where size>=35 & size<=47;
run;
The GLM Procedure
Number of Observations Read 221
Number of Observations Used 221
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 766.65527 383.32764 4.24 0.0156
Error 218 19694.93698 90.34375
Corrected Total 220 20461.59226
R-Square Coeff Var Root MSE read Mean
0.037468 13.23299 9.504933 71.82755
Source DF Type I SS Mean Square F Value Pr > F
csize 1 521.8926837 521.8926837 5.78 0.0171
small 1 244.7625898 244.7625898 2.71 0.1012
Source DF Type III SS Mean Square F Value Pr > F
csize 1 0.2672968 0.2672968 0.00 0.9567
small 1 244.7625898 244.7625898 2.71 0.1012
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.76637004 1.67352147 41.09 <.0001
csize 0.01707451 0.31390657 0.05 0.9567
small 4.12172842 2.50412442 1.65 0.1012
proc glm data=angrist;
model read = csize small;
where size>=34 & size<=48;
run;
The GLM Procedure
Number of Observations Read 259
Number of Observations Used 259
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 827.84469 413.92234 4.76 0.0093
Error 256 22239.42185 86.87274
Corrected Total 258 23067.26654
R-Square Coeff Var Root MSE read Mean
0.035888 12.96111 9.320555 71.91172
Source DF Type I SS Mean Square F Value Pr > F
csize 1 564.9314706 564.9314706 6.50 0.0114
small 1 262.9132181 262.9132181 3.03 0.0831
Source DF Type III SS Mean Square F Value Pr > F
csize 1 0.0046601 0.0046601 0.00 0.9942
small 1 262.9132181 262.9132181 3.03 0.0831
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 68.98157843 1.51761499 45.45 <.0001
csize 0.00182198 0.24876314 0.01 0.9942
small 4.01178643 2.30607463 1.74 0.0831
proc glm data=angrist;
model read = csize small;
where size>=33 & size<=49;
run;
The GLM Procedure
Number of Observations Read 288
Number of Observations Used 288
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 727.97326 363.98663 4.36 0.0136
Error 285 23778.04705 83.43174
Corrected Total 287 24506.02031
R-Square Coeff Var Root MSE read Mean
0.029706 12.65253 9.134098 72.19189
Source DF Type I SS Mean Square F Value Pr > F
csize 1 486.5361321 486.5361321 5.83 0.0164
small 1 241.4371313 241.4371313 2.89 0.0900
Source DF Type III SS Mean Square F Value Pr > F
csize 1 0.3105589 0.3105589 0.00 0.9514
small 1 241.4371313 241.4371313 2.89 0.0900
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 69.54771739 1.40848952 49.38 <.0001
csize -0.01282446 0.21020003 -0.06 0.9514
small 3.67186597 2.15849203 1.70 0.0900
proc glm data=angrist;
model read = csize small;
where size>=32 & size<=50;
run;
The GLM Procedure
Number of Observations Read 315
Number of Observations Used 315
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 662.42455 331.21227 4.08 0.0178
Error 312 25337.79969 81.21090
Corrected Total 314 26000.22424
R-Square Coeff Var Root MSE read Mean
0.025478 12.41907 9.011709 72.56346
Source DF Type I SS Mean Square F Value Pr > F
csize 1 490.8500261 490.8500261 6.04 0.0145
small 1 171.5745236 171.5745236 2.11 0.1471
Source DF Type III SS Mean Square F Value Pr > F
csize 1 1.9174557 1.9174557 0.02 0.8780
small 1 171.5745236 171.5745236 2.11 0.1471
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 70.37797062 1.32498335 53.12 <.0001
csize 0.02785393 0.18127211 0.15 0.8780
small 2.96634260 2.04080759 1.45 0.1471
proc glm data=angrist;
model read = csize small;
where size>=31 & size<=51;
run;
The GLM Procedure
Number of Observations Read 352
Number of Observations Used 352
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 760.14017 380.07008 4.58 0.0109
Error 349 28979.74327 83.03651
Corrected Total 351 29739.88343
R-Square Coeff Var Root MSE read Mean
0.025560 12.55109 9.112437 72.60273
Source DF Type I SS Mean Square F Value Pr > F
csize 1 571.6505728 571.6505728 6.88 0.0091
small 1 188.4895936 188.4895936 2.27 0.1328
Source DF Type III SS Mean Square F Value Pr > F
csize 1 4.4935683 4.4935683 0.05 0.8162
small 1 188.4895936 188.4895936 2.27 0.1328
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 70.38685823 1.25279621 56.18 <.0001
csize 0.03592477 0.15443043 0.23 0.8162
small 2.92719390 1.94286379 1.51 0.1328
proc glm data=angrist;
model read = csize small;
where size>=30 & size<=52;
run;
The GLM Procedure
Number of Observations Read 385
Number of Observations Used 385
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 632.40608 316.20304 3.91 0.0209
Error 382 30914.61803 80.92832
Corrected Total 384 31547.02411
R-Square Coeff Var Root MSE read Mean
0.020046 12.38679 8.996017 72.62590
Source DF Type I SS Mean Square F Value Pr > F
csize 1 362.7692594 362.7692594 4.48 0.0349
small 1 269.6368231 269.6368231 3.33 0.0687
Source DF Type III SS Mean Square F Value Pr > F
csize 1 8.8270914 8.8270914 0.11 0.7414
small 1 269.6368231 269.6368231 3.33 0.0687
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 70.28562412 1.18328290 59.40 <.0001
csize -0.04415984 0.13371156 -0.33 0.7414
small 3.36202761 1.84188256 1.83 0.0687
proc glm data=angrist;
model read = csize small;
where size>=29 & size<=53;
run;
The GLM Procedure
Number of Observations Read 423
Number of Observations Used 423
The GLM Procedure
Dependent Variable: read verbal score (class average)
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 523.19971 261.59986 3.10 0.0459
Error 420 35406.02649 84.30006
Corrected Total 422 35929.22620
R-Square Coeff Var Root MSE read Mean
0.014562 12.64227 9.181507 72.62545
Source DF Type I SS Mean Square F Value Pr > F
csize 1 116.4220236 116.4220236 1.38 0.2406
small 1 406.7776872 406.7776872 4.83 0.0286
Source DF Type III SS Mean Square F Value Pr > F
csize 1 115.0706088 115.0706088 1.37 0.2433
small 1 406.7776872 406.7776872 4.83 0.0286
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 70.11879928 1.15121736 60.91 <.0001
csize -0.13916580 0.11911440 -1.17 0.2433
small 3.95314634 1.79960951 2.20 0.0286
