Multiple Imputation for Continuous Variables with Monotone Missing Data

Different types of missing data require different types of imputation procedures (many of which can be performed with PROC MI) based upon the variables (are they categorical, continuous, binary) and the pattern of missingness in the data (discussed below). This page deals with the procedure for imputing missing data when the variables to be imputed are all continuous and have a monotone missing data pattern.

Examples

This example uses data from the 200 subject version of the highschool and beyond dataset. This dataset includes data on high school students’ scores on a tests in different academic areas. This dataset has been modified so that some of the cases have missing values. You can download the dataset here https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2_w_missing.sas7bdat . For this example, lets assume that a researcher wants to test the theory that science aptitude is predicted by students’ aptitude in reading, writing, and math using this data. Unfortunately, some students are missing data on the four variables in our analysis.

Patterns of Missingness

Dataset 1: Monotone Missing Data

id  V1  V2  V3  V4

 1   2   5   9   3

 2   3   1   2   .

 3   2   6   5   .

 4   1   4   .   .

 5   3   .   .   .

Dataset 2: Non-Monotone Missing Data

id  V1  V2  V3  V4

 1   2   5   9   3

 2   3   7   .   .

 3   2   .   5   9

 4   1   4   .   2

 5   3   .   5   .

proc means data=mi.hsb2_w_missing ;

var read science math write;

run;

 The MEANS Procedure

	Variable      N            Mean         Std Dev         Minimum         Maximum
	-------------------------------------------------------------------------------
	READ 	    200       52.2300000     10.2529368       28.0000000     76.0000000   
	SCIENCE     193       52.3367876      9.7004910       26.0000000     74.0000000
	MATH        185       53.5567568      9.1001834       35.0000000     75.0000000
	WRITE       175       54.1542857      8.8261739       31.0000000     67.0000000
	-------------------------------------------------------------------------------

Examining distributions of missing values in SAS

We can look at the patterns of missing values. We can recode each variable into a dummy variable such that 1 is missing and 0 is nonmissing. Then we use the proc freq with statement tables with option list to compute the frequency for each pattern of missing data.

data mi.hsb2_w_missing2 (drop=i);
set mi.hsb2_w_missing;
array test1{*} read science math write;
do i=1 to dim(test1);
if test1{i} =. then test1{i}=1;
else test1{i}=0;
end;
run;
proc freq data=mi.hsb2_w_missing2;
tables read*science*math*write /list;
run;

                                                 Cumulative Cumulative
READ  SCIENCE  MATH  WRITE  Frequency   Percent   Frequency    Percent
-----------------------------------------------------------------------

   0        0     0      0        175     87.50         175      87.50
   0        0     0      1         10      5.00         185      92.50
   0        0     1      1          8      4.00         193      96.50
   0        1     1      1          7      3.50         200     100.00

This table shows us that 175 cases have no missing data, 10 cases are missing values on just the WRITE variable, 8 cases are missing data on the MATH and WRITE variables, and 7 cases are missing data on SCIENCE, MATH, and WRITE. Since all of the cases that have missing values on SCIENCE also have missing values on both MATH and WRITE, and all of the cases that have missing values on MATH also have missing values on WRITE, we say that the pattern of missingness is monotone. Any time the missing data is, or can be, arranged to form the triangle of 1’s seen in the above table, the pattern is said to be monotone.

Imputing the Missing Values

Analyzing Multiply Imputed Datasets

There are actually two steps in analyzing a multiply imputed dataset. First, we use an analysis procedure to analyze our multiply imputed datasets, and then we use PROC MIANALYZE to combine the results.

Multiple Imputation for Continuous Variables with Monotone Missing Data | SAS Data Analysis Examples

Examples

Patterns of Missingness

Imputing the Missing Values

Analyzing Multiply Imputed Datasets

Discussion

See Also