When a data file has missing values, sometimes we may want to be able to distinguish between different types of missing values. For example, we can have missing values because of non-response or missing values because of invalid data entry. The examples here are related to this issue.
Example 1: Specifying types of missing values in a data set
In SAS, we can use letters A-Z and underscore "_" to indicate the type of missing values.
In the example below, variable female has value -999 indicating that the subject refused to answer the question and value -99 indicating a data entry error. It is the same with variable ses. The first code fragment hard codes the changes, the second does the operation in an array.
data test1; input score female ses ; datalines; 56 1 1 62 1 2 73 0 3 67 -999 1 57 0 1 56 -99 2 57 1 -999 ; run; *hard code; data test1a; set test1; if female = -999 then female=.a; if female = -99 then female = .b; if ses = -999 then ses = .a; run; proc print data = test1a; run; Obs score female ses 1 56 1 1 2 62 1 2 3 73 0 3 4 67 A 1 5 57 0 1 6 56 B 2 7 57 1 A *using the array; data test1b; set test1; array miss(2) female ses; do i = 1 to 2; if miss(i) = -999 then miss(i) =.a; if miss(i) = -99 then miss(i) =.b; end; drop i; run; proc print data = test1b; run; Obs score female ses 1 56 1 1 2 62 1 2 3 73 0 3 4 67 A 1 5 57 0 1 6 56 B 2 7 57 1 A
We should notice that when SAS prints a special missing value, it prints only the letter or underscore, not the dot ".".
Example 2: Specifying types of missing values in a raw data file
We have a tiny example raw data file called tiny.txt with three variables shown below. The variables are score, female and ses. These three variables are meant to be numeric, except that we have special characters for missing values. For example, in this example, "a" means that the subject refused to give the information and "b" means data entry error. Notice that valid characters here are 26 letters, a-z and underscore "_".
56 1 1 62 1 2 73 0 3 67 a 1 57 0 1 56 1 2 57 1 b
We want to read the variables as numeric and we also want to keep the information on the nature of missing values. In SAS, we can read these variables as numeric from this file by using the missing statement in the data step. Here is how we can do it:
data test0; missing a b; infile 'd:tempmissing.txt'; input score female ses ; run; proc print data = test0; run;
Obs score female ses
1 56 1 1 2 62 1 2 3 73 0 3 4 67 A 1 5 57 0 1 6 56 1 2 7 57 1 B
There are then two types of missing data type in the data set test0: .A and .B. For example, when we want to refer to the 4th observation where value for variable female is missing, we can use where statement such as "where female=.a;" as shown in the following example:
proc print data = test0; where female=.a; run;Obs score female ses 4 67 A 1