When modeling using an imputed dataset, you may be interested in testing hypotheses beyond those included in the model. However, since you are working with an imputed dataset, such tests must be executed carefully to accurately capture and treat the errors within each imputation.
In this example, we look at an imputed dataset, https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2_mi.sas7bdat, containing 5 imputations of 200 records each. The dataset contains scores from math, reading, and writing assessments. We wish to predict writing scores using math scores, reading scores, and their interaction. Additionally, we wish to know the effect of math scores on writing scores when reading scores are at their mean value.
There were no missing reading scores in the original dataset, so we can find the mean of the reading score using the first imputation. Then, we can run our regression with an added estimate statement to generate this additional value.
proc means data = hsb2 mean; where _imputation_=1; var read; run; The MEANS Procedure Analysis Variable : READ reading score Mean ------------ 52.2300000 ------------
To test the effect of math scores on writing scores when reading scores are at the mean, we can add an estimate line to our model. In the estimate line, we list the model predictors and the values at which we wish them to be evaluated (the mean reading score). We also add an ods line that outputs this additional estimate.
proc glm data = hsb2; by _imputation_; model write = math read math*read /solution; ods output estimates = est; estimate "effect of math when read at its mean" math 1 math*read 52.23; run; quit;< output omitted >
We have indicated that the model is to be fit by imputation, so for each imputation, the effect we specified will be estimated. Thus, our output estimates dataset will contain one record for each imputation with the estimate value, standard error, t-statistic, and p-value.
proc print data = est; run;Obs _IMPUTATION_ Dependent Parameter Estimate StdErr tValue Probt 1 1 WRITE effect of math when read at its mean 0.39637435 0.07232249 5.48 <.0001 2 2 WRITE effect of math when read at its mean 0.40916934 0.07239891 5.65 <.0001 3 3 WRITE effect of math when read at its mean 0.35405486 0.07240171 4.89 <.0001 4 4 WRITE effect of math when read at its mean 0.32727500 0.06995155 4.68 <.0001 5 5 WRITE effect of math when read at its mean 0.36827458 0.07557337 4.87 <.0001
Next, we can use the mianalyze function in SAS to appropriately test the effect as estimated in the imputations.
proc mianalyze data = est ; modeleffects estimate; stderr stderr; run;The MIANALYZE Procedure Model Information Data Set WORK.EST Number of Imputations 5 Multiple Imputation Variance Information -----------------Variance----------------- Parameter Between Within Total DF estimate 0.001077 0.005264 0.006556 102.97 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency estimate 0.245483 0.212252 0.959278 Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF estimate 0.371030 0.080969 0.210447 0.531612 102.97 Multiple Imputation Parameter Estimates t for H0: Parameter Minimum Maximum Theta0 Parameter=Theta0 Pr > |t| estimate 0.327275 0.409169 0 4.58 <.0001
From these results, we can see that our effect is estimated at 0.371030, with a t-statistic of 4.58. We could also arrive at these results without using the estimate statement if we first recode our reading variable to be mean-centered. With this change, our model estimate for math will be the effect of math scores when reading scores are at the mean.
data hsb2_recode; set hsb2; read_mean = read - 52.23; run; proc glm data = hsb2_recode; by _imputation_; model write = math read_mean math*read_mean /solution; ods output ParameterEstimates=g1; run; quit;< output omitted >
proc print data = g1; run; Obs _IMPUTATION_ Dependent Parameter Estimate StdErr tValue Probt 1 1 WRITE Intercept 32.29670800 3.82436298 8.44 <.0001 2 1 WRITE MATH 0.39637435 0.07232249 5.48 <.0001 3 1 WRITE read_mean 0.67185584 0.29393060 2.29 0.0233 4 1 WRITE MATH*read_mean -0.00641350 0.00540304 -1.19 0.2367 5 2 WRITE Intercept 31.85168836 3.80594486 8.37 <.0001 6 2 WRITE MATH 0.40916934 0.07239891 5.65 <.0001 7 2 WRITE read_mean 0.74334679 0.28652371 2.59 0.0102 8 2 WRITE MATH*read_mean -0.00824171 0.00526009 -1.57 0.1188 9 3 WRITE Intercept 34.77399716 3.79528912 9.16 <.0001 10 3 WRITE MATH 0.35405486 0.07240171 4.89 <.0001 11 3 WRITE read_mean 0.71299214 0.28302721 2.52 0.0126 12 3 WRITE MATH*read_mean -0.00691051 0.00524564 -1.32 0.1892 13 4 WRITE Intercept 36.16668789 3.67582533 9.84 <.0001 14 4 WRITE MATH 0.32727500 0.06995155 4.68 <.0001 15 4 WRITE read_mean 0.76939311 0.28108716 2.74 0.0068 16 4 WRITE MATH*read_mean -0.00762610 0.00518082 -1.47 0.1426 17 5 WRITE Intercept 34.00235169 3.96378670 8.58 <.0001 18 5 WRITE MATH 0.36827458 0.07557337 4.87 <.0001 19 5 WRITE read_mean 0.83315180 0.30430022 2.74 0.0068 20 5 WRITE MATH*read_mean -0.00912756 0.00560804 -1.63 0.1052 proc mianalyze params = g1 ; modeleffects intercept math read_mean math*read_mean; run; Model Information PARMS Data Set WORK.G1 Number of Imputations 5 Multiple Imputation Variance Information -----------------Variance----------------- Parameter Between Within Total DF intercept 3.161240 14.547697 18.341185 93.506 math 0.001077 0.005264 0.006556 102.97 read_mean 0.003684 0.084041 0.088462 1601.6 math*read_mean 0.000001152 0.000028534 0.000029916 1872.7 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency intercept 0.260762 0.223267 0.957255 math 0.245483 0.212252 0.959278 read_mean 0.052605 0.051160 0.989872 math*read_mean 0.048455 0.047233 0.990642 Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF intercept 33.818287 4.282661 25.31438 42.32220 93.506 math 0.371030 0.080969 0.21045 0.53161 102.97 read_mean 0.746148 0.297425 0.16276 1.32953 1601.6 math*read_mean -0.007664 0.005470 -0.01839 0.00306 1872.7 Multiple Imputation Parameter Estimates t for H0: Parameter Minimum Maximum Theta0 Parameter=Theta0 Pr > |t| intercept 31.851688 36.166688 0 7.90 <.0001 math 0.327275 0.409169 0 4.58 <.0001 read_mean 0.671856 0.833152 0 2.51 0.0122 math*read_mean -0.009128 -0.006413 0 -1.40 0.1613
We can see that our estimate, t-statistic, and p-value for math match those arrived at using the estimate statement.