Section 13.1 Detecting Collinearity
Table 13.1 and Table 13.2 using data file Ericksen.
proc reg data=ericksen;
model undcount=perc_min crimrate poverty diffeng hsgrad housing city countprc ;
run;
quit;
proc corr data=ericksen;
var perc_min crimrate poverty diffeng hsgrad housing city countprc ;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: undcount
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 280.79543 35.09943 17.25 <.0001
Error 57 115.98480 2.03482
Corrected Total 65 396.78023
Root MSE 1.42647 R-Square 0.7077
Dependent Mean 1.92106 Adj R-Sq 0.6667
Coeff Var 74.25437
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -1.77139 1.38218 -1.28 0.2052
perc_min 1 0.07983 0.02261 3.53 0.0008
crimrate 1 0.03012 0.01300 2.32 0.0241
poverty 1 -0.17837 0.08492 -2.10 0.0401
diffeng 1 0.21512 0.09221 2.33 0.0232
hsgrad 1 0.06129 0.04477 1.37 0.1764
housing 1 -0.03496 0.02463 -1.42 0.1613
city 1 1.15998 0.77064 1.51 0.1378
countprc 1 0.03699 0.00925 4.00 0.0002
The CORR Procedure
8 Variables: perc_min crimrate poverty diffeng hsgrad housing city countprc
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
perc_min 66 19.43636 17.51441 1283 0.70000 72.60000
crimrate 66 63.06061 24.89107 4162 25.00000 143.00000
poverty 66 13.46818 4.48108 888.90000 6.80000 23.90000
diffeng 66 1.92576 2.45396 127.10000 0.20000 12.70000
hsgrad 66 33.64697 8.49286 2221 17.50000 51.80000
housing 66 15.66515 9.82810 1034 7.00000 52.10000
city 66 0.24242 0.43183 16.00000 0 1.00000
countprc 66 11.72727 24.86737 774.00000 0 100.00000
Pearson Correlation Coefficients, N = 66
Prob > |r| under H0: Rho=0
perc_min crimrate poverty diffeng hsgrad housing city countprc
perc_min 1.00000 0.65490 0.73842 0.39545 0.53516 0.35679 0.75774 -0.33444
<.0001 <.0001 0.0010 <.0001 0.0033 <.0001 0.0061
crimrate 0.65490 1.00000 0.36911 0.51165 0.06656 0.53172 0.72857 -0.23309
<.0001 0.0023 <.0001 0.5954 <.0001 <.0001 0.0596
poverty 0.73842 0.36911 1.00000 0.15157 0.75064 0.33522 0.53752 -0.15704
<.0001 0.0023 0.2244 <.0001 0.0059 <.0001 0.2079
diffeng 0.39545 0.51165 0.15157 1.00000 -0.11640 0.34021 0.48036 -0.10819
0.0010 <.0001 0.2244 0.3520 0.0052 <.0001 0.3872
hsgrad 0.53516 0.06656 0.75064 -0.11640 1.00000 0.23485 0.31482 -0.41422
<.0001 0.5954 <.0001 0.3520 0.0577 0.0100 0.0005
housing 0.35679 0.53172 0.33522 0.34021 0.23485 1.00000 0.56570 -0.08629
0.0033 <.0001 0.0059 0.0052 0.0577 <.0001 0.4909
city 0.75774 0.72857 0.53752 0.48036 0.31482 0.56570 1.00000 -0.26882
<.0001 <.0001 <.0001 <.0001 0.0100 <.0001 0.0291
countprc -0.33444 -0.23309 -0.15704 -0.10819 -0.41422 -0.08629 -0.26882 1.00000
0.0061 0.0596 0.2079 0.3872 0.0005 0.4909 0.0291
Section 13.2 Coping With Collinearity: No Quick Fix
Figure 13.6 on page 358 using dataset ericksen. In this example it is also shown how to create an annotate set and how to use the function compress to compress a string variable.
data temp;
set ericksen;
M=perc_min;
C=crimrate;
P=poverty;
L=diffeng;
H=hsgrad;
O=housing;
I=city;
N=countprc;
run;
proc reg data=temp;
model undcount= M C P L H O I N / selection=cp ;
ods output SubsetSelSummary=cperick;
run;
quit;
data subcperick;
set cperick;
np=NumInModel+1;
diff=cp-np;
if diff <= 10;
var=compress(VarsInModel);/*to take away the spaces*/
output;
drop Model Dependent Control RSquare;
run;
data labels; /*Annotate set created for labeling*/
length function style text $ 8;
retain function 'label' xsys ysys '2' style 'swiss'
size 1 when 'a' color 'black';
set subcperick end=lastob;
/* determine the values of the */
x=np; y=diff; /* of the x and y variables.*/
text=left(put(var, $8.));
position='B';
run;
axis1 order=(4 to 9 by 1) offset=(3, 5);
axis2 order=(0 to 10 by 2) label=(r=0 a=90);
symbol c=black i=none v=none;
proc gplot data=subcperick;
plot diff*np=1 / annotate=labels vminor=0 hminor=0 haxis=axis1 vaxis=axis2 ;
label np='p';
label diff='Cp-p';
run;
quit;

In order to create Table 13.4 on page 359 using data file Ericksen, we create a table that contains the number of variables and their names in the model and the corresponding R-squares for the "Best" models. Table 13.4 can then be created based on this information by doing regression one model at a time.
proc reg data=ericksen;
model undcount= perc_min crimrate poverty diffeng hsgrad housing city countprc
/ selection=cp ;
ods output SubsetSelSummary=cperick;
run;
quit;
proc sort data=cperick;
by NumInModel RSquare;
run;
data short;
set cperick;
by NumInModel;
if last.NumInModel then output;
drop Model Dependent Control;
run;
proc print data=short;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: undcount
C(p) Selection Method
Number in
Model C(p) R-Square Variables in Model
5 7.3196 0.6855 perc_min crimrate poverty diffeng countprc
6 7.9829 0.6924 perc_min crimrate poverty diffeng city countprc
.....
6 8.2075 0.6912 perc_min crimrate poverty diffeng hsgrad countprc
4 8.5149 0.6691 perc_min crimrate diffeng countprc
5 8.8253 0.6778 perc_min poverty diffeng city countprc
Model Cp RSquare VarsInModel 1 1 36.7625 0.4935 perc_min 2 2 23.9334 0.5696 perc_min hsgrad 3 3 12.6764 0.6375 perc_min crimrate countprc 4 4 8.5149 0.6691 perc_min crimrate diffeng countprc 5 5 7.3196 0.6855 perc_min crimrate poverty diffeng countprc 6 6 7.9829 0.6924 perc_min crimrate poverty diffeng city countprc 7 7 8.8738 0.6981 perc_min crimrate poverty diffeng housing city countprc 8 8 9.0000 0.7077 perc_min crimrate poverty diffeng hsgrad housing city countprc
