How can I test additional estimates in imputed dataset models?

When modeling using an imputed dataset, you may be interested in testing hypotheses beyond those included in the model. However, since you are working with an imputed dataset, such tests must be executed carefully to accurately capture and treat the errors within each imputation.

In this example, we look at an imputed dataset, https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2_mi.sas7bdat, containing 5 imputations of 200 records each. The dataset contains scores from math, reading, and writing assessments. We wish to predict writing scores using math scores, reading scores, and their interaction. Additionally, we wish to know the effect of math scores on writing scores when reading scores are at their mean value.

There were no missing reading scores in the original dataset, so we can find the mean of the reading score using the first imputation. Then, we can run our regression with an added estimate statement to generate this additional value.

proc means data = hsb2 mean;
 where _imputation_=1;
 var read;
run;


The MEANS Procedure

Analysis Variable : READ reading score

        Mean
------------
  52.2300000
------------

To test the effect of math scores on writing scores when reading scores are at the mean, we can add an estimate line to our model. In the estimate line, we list the model predictors and the values at which we wish them to be evaluated (the mean reading score). We also add an ods line that outputs this additional estimate.

proc glm data = hsb2;
 by _imputation_;
 model write = math read math*read /solution;
 ods output estimates = est;
 estimate "effect of math when read at its mean" math 1 math*read 52.23; 
run;
quit;

< output omitted >

We have indicated that the model is to be fit by imputation, so for each imputation, the effect we specified will be estimated. Thus, our output estimates dataset will contain one record for each imputation with the estimate value, standard error, t-statistic, and p-value.

proc print data = est; run;

Obs    _IMPUTATION_    Dependent                 Parameter                Estimate          StdErr     tValue     Probt

 1           1           WRITE      effect of math when read at its mean 0.39637435      0.07232249       5.48    <.0001
 2           2           WRITE      effect of math when read at its mean 0.40916934      0.07239891       5.65    <.0001
 3           3           WRITE      effect of math when read at its mean 0.35405486      0.07240171       4.89    <.0001
 4           4           WRITE      effect of math when read at its mean 0.32727500      0.06995155       4.68    <.0001
 5           5           WRITE      effect of math when read at its mean 0.36827458      0.07557337       4.87    <.0001

Next, we can use the mianalyze function in SAS to appropriately test the effect as estimated in the imputations.

proc mianalyze data = est ;
  modeleffects estimate;
  stderr stderr;
run;

The MIANALYZE Procedure

          Model Information

Data Set                  WORK.EST
Number of Imputations     5


            Multiple Imputation Variance Information
             -----------------Variance-----------------
Parameter         Between         Within          Total       DF
estimate         0.001077       0.005264       0.006556   102.97

       Multiple Imputation Variance Information
                 Relative       Fraction
                 Increase        Missing       Relative
Parameter     in Variance    Information     Efficiency
estimate         0.245483       0.212252       0.959278


                  Multiple Imputation Parameter Estimates
Parameter        Estimate      Std Error    95% Confidence Limits        DF
estimate         0.371030       0.080969     0.210447     0.531612   102.97

                       Multiple Imputation Parameter Estimates
                                                                 t for H0:
Parameter         Minimum        Maximum         Theta0   Parameter=Theta0   Pr > |t|
estimate         0.327275       0.409169              0               4.58     <.0001

From these results, we can see that our effect is estimated at 0.371030, with a t-statistic of 4.58. We could also arrive at these results without using the estimate statement if we first recode our reading variable to be mean-centered. With this change, our model estimate for math will be the effect of math scores when reading scores are at the mean.


data hsb2_recode;
  set hsb2;
  read_mean = read - 52.23;
run;

proc glm data = hsb2_recode;
  by _imputation_;
  model write = math read_mean math*read_mean /solution;
  ods output ParameterEstimates=g1;
run;
quit;

< output omitted >

proc print data = g1;
run;

Obs   _IMPUTATION_   Dependent   Parameter            Estimate         StdErr    tValue    Probt

  1         1          WRITE     Intercept         32.29670800     3.82436298      8.44   <.0001
  2         1          WRITE     MATH               0.39637435     0.07232249      5.48   <.0001
  3         1          WRITE     read_mean          0.67185584     0.29393060      2.29   0.0233
  4         1          WRITE     MATH*read_mean    -0.00641350     0.00540304     -1.19   0.2367
  5         2          WRITE     Intercept         31.85168836     3.80594486      8.37   <.0001
  6         2          WRITE     MATH               0.40916934     0.07239891      5.65   <.0001
  7         2          WRITE     read_mean          0.74334679     0.28652371      2.59   0.0102
  8         2          WRITE     MATH*read_mean    -0.00824171     0.00526009     -1.57   0.1188
  9         3          WRITE     Intercept         34.77399716     3.79528912      9.16   <.0001
 10         3          WRITE     MATH               0.35405486     0.07240171      4.89   <.0001
 11         3          WRITE     read_mean          0.71299214     0.28302721      2.52   0.0126
 12         3          WRITE     MATH*read_mean    -0.00691051     0.00524564     -1.32   0.1892
 13         4          WRITE     Intercept         36.16668789     3.67582533      9.84   <.0001
 14         4          WRITE     MATH               0.32727500     0.06995155      4.68   <.0001
 15         4          WRITE     read_mean          0.76939311     0.28108716      2.74   0.0068
 16         4          WRITE     MATH*read_mean    -0.00762610     0.00518082     -1.47   0.1426
 17         5          WRITE     Intercept         34.00235169     3.96378670      8.58   <.0001
 18         5          WRITE     MATH               0.36827458     0.07557337      4.87   <.0001
 19         5          WRITE     read_mean          0.83315180     0.30430022      2.74   0.0068
 20         5          WRITE     MATH*read_mean    -0.00912756     0.00560804     -1.63   0.1052


proc mianalyze params = g1 ;
  modeleffects intercept math read_mean math*read_mean;
run;

          Model Information

PARMS Data Set            WORK.G1
Number of Imputations     5


              Multiple Imputation Variance Information

                  -----------------Variance-----------------
Parameter              Between         Within          Total       DF

intercept             3.161240      14.547697      18.341185   93.506
math                  0.001077       0.005264       0.006556   102.97
read_mean             0.003684       0.084041       0.088462   1601.6
math*read_mean     0.000001152    0.000028534    0.000029916   1872.7

          Multiple Imputation Variance Information

                      Relative       Fraction
                      Increase        Missing       Relative
Parameter          in Variance    Information     Efficiency

intercept             0.260762       0.223267       0.957255
math                  0.245483       0.212252       0.959278
read_mean             0.052605       0.051160       0.989872
math*read_mean        0.048455       0.047233       0.990642


                    Multiple Imputation Parameter Estimates

Parameter             Estimate      Std Error    95% Confidence Limits        DF

intercept            33.818287       4.282661     25.31438     42.32220   93.506
math                  0.371030       0.080969      0.21045      0.53161   102.97
read_mean             0.746148       0.297425      0.16276      1.32953   1601.6
math*read_mean       -0.007664       0.005470     -0.01839      0.00306   1872.7

                         Multiple Imputation Parameter Estimates

                                                                      t for H0:
Parameter              Minimum        Maximum         Theta0   Parameter=Theta0   Pr > |t|

intercept            31.851688      36.166688              0               7.90     <.0001
math                  0.327275       0.409169              0               4.58     <.0001
read_mean             0.671856       0.833152              0               2.51     0.0122
math*read_mean       -0.009128      -0.006413              0              -1.40     0.1613

We can see that our estimate, t-statistic, and p-value for math match those arrived at using the estimate statement.