How can I recode missing values into different categories?

Stata allows us to code different types of numeric missing values. It has 27 numeric missing categories. “.a” to “.z” and “.“. In this page we will show how to code missing values into different categories.

First we create a data set for the purpose of illustration. In this data set, all the variables are numeric and the variables female and ses have missing values. The non-missing values for variable female is 0 (for male) and 1 (for female). The non-missing values for variable ses is 0 (low), 1 (med) and 2 (high). The rest of the values are considered to be missing values.

clear

input score female ses 
56    1     1 
62    1     2 
73    0     3
67 -999     1
57    0     1
56  -99     2
57    1  -999
78   -2     1
67    1    -1
92    1     1
57   -1     2
57    0    -1
58    0     3
78    1     0
end

Let’s say that we want to code -999 into one category, -99 into another and the rest of missing values into a third category for all the variables.

Method 1: Using command replace

We can manually replace missing values with “.a” for -999, “.b” for -99 and .c for the rest of missing values. For example, for variable female, we can do the following:

replace female = .a if female == -999
replace female = .b if female == -99
replace female = .c if female >= -3 & female < 0
list, clean

       score   female    ses  
  1.      56        1      1  
  2.      62        1      2  
  3.      73        0      3  
  4.      67       .a      1  
  5.      57        0      1  
  6.      56       .b      2  
  7.      57        1   -999  
  8.      78       .c      1  
  9.      67        1     -1  
 10.      92        1      1  
 11.      57       .c      2  
 12.      57        0     -1  
 13.      58        0      3  
 14.      78        1      1
 
codebook female
---------------------------------------------------------------------------------------------------
female                                                                                  (unlabeled)
---------------------------------------------------------------------------------------------------
                  type:  numeric (float)
                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/14
       unique mv codes:  3                       missing .*:  4/14
            tabulation:  Freq.  Value
                             4  0
                             6  1
                             1  .a
                             1  .b
                             2  .c

The codebook command above shows that variable female has three types of missing values and 4 missing values.

Method 2: Using command mvdecode

Method 1 may not be the best way of recoding missing values into different categories. For one thing, we have to do it one variable at a time. Stata’s mvdecode command comes handy for us.

mvdecode female ses, mv(-999 = .a \ -99 = .b \ -3/-1 = .c)
      female: 4 missing values generated
         ses: 3 missing values generated

list, clean
       score   female   ses  
  1.      56        1     1  
  2.      62        1     2  
  3.      73        0     3  
  4.      67       .a     1  
  5.      57        0     1  
  6.      56       .b     2  
  7.      57        1    .a  
  8.      78       .c     1  
  9.      67        1    .c  
 10.      92        1     1  
 11.      57       .c     2  
 12.      57        0    .c  
 13.      58        0     3  
 14.      78        1     0

Better yet, we can use the key word _all to refer to all the variables in the data set.

mvdecode _all, mv(-999 = .a \ -99 = .b \ -3/-1 = .c)
      female: 4 missing values generated
         ses: 3 missing values generated

Going from missing value codes to numeric values

The other issue that we will cover here is how to change missing value codes back to numeric values. The command mvencode is paired with command mvdecode that we just covered above and is the one to use here.

mvencode female, mv(.a = -999 \ .b = -99 \ .c = -50)
      female: 4 missing values recoded

list, clean

       score   female   ses  
  1.      56        1     1  
  2.      62        1     2  
  3.      73        0     3  
  4.      67     -999     1  
  5.      57        0     1  
  6.      56      -99     2  
  7.      57        1    .a  
  8.      78      -50     1  
  9.      67        1    .c  
 10.      92        1     1  
 11.      57      -50     2  
 12.      57        0    .c  
 13.      58        0     3  
 14.      78        1     1