Stata allows us to code different types of numeric missing values. It has 27 numeric missing categories. “.a” to “.z” and “.“. In this page we will show how to code missing values into different categories.
First we create a data set for the purpose of illustration. In this data set, all the variables are numeric and the variables female and ses have missing values. The non-missing values for variable female is 0 (for male) and 1 (for female). The non-missing values for variable ses is 0 (low), 1 (med) and 2 (high). The rest of the values are considered to be missing values.
clear input score female ses 56 1 1 62 1 2 73 0 3 67 -999 1 57 0 1 56 -99 2 57 1 -999 78 -2 1 67 1 -1 92 1 1 57 -1 2 57 0 -1 58 0 3 78 1 0 end
Let’s say that we want to code -999 into one category, -99 into another and the rest of missing values into a third category for all the variables.
Method 1: Using command replace
We can manually replace missing values with “.a” for -999, “.b” for -99 and .c for the rest of missing values. For example, for variable female, we can do the following:
replace female = .a if female == -999 replace female = .b if female == -99 replace female = .c if female >= -3 & female < 0 list, clean score female ses 1. 56 1 1 2. 62 1 2 3. 73 0 3 4. 67 .a 1 5. 57 0 1 6. 56 .b 2 7. 57 1 -999 8. 78 .c 1 9. 67 1 -1 10. 92 1 1 11. 57 .c 2 12. 57 0 -1 13. 58 0 3 14. 78 1 1 codebook female --------------------------------------------------------------------------------------------------- female (unlabeled) --------------------------------------------------------------------------------------------------- type: numeric (float) range: [0,1] units: 1 unique values: 2 missing .: 0/14 unique mv codes: 3 missing .*: 4/14 tabulation: Freq. Value 4 0 6 1 1 .a 1 .b 2 .c
The codebook command above shows that variable female has three types of missing values and 4 missing values.
Method 2: Using command mvdecode
Method 1 may not be the best way of recoding missing values into different categories. For one thing, we have to do it one variable at a time. Stata’s mvdecode command comes handy for us.
mvdecode female ses, mv(-999 = .a \ -99 = .b \ -3/-1 = .c) female: 4 missing values generated ses: 3 missing values generated list, clean score female ses 1. 56 1 1 2. 62 1 2 3. 73 0 3 4. 67 .a 1 5. 57 0 1 6. 56 .b 2 7. 57 1 .a 8. 78 .c 1 9. 67 1 .c 10. 92 1 1 11. 57 .c 2 12. 57 0 .c 13. 58 0 3 14. 78 1 0
Better yet, we can use the key word _all to refer to all the variables in the data set.
mvdecode _all, mv(-999 = .a \ -99 = .b \ -3/-1 = .c) female: 4 missing values generated ses: 3 missing values generated
Going from missing value codes to numeric values
The other issue that we will cover here is how to change missing value codes back to numeric values. The command mvencode is paired with command mvdecode that we just covered above and is the one to use here.
mvencode female, mv(.a = -999 \ .b = -99 \ .c = -50) female: 4 missing values recoded list, clean score female ses 1. 56 1 1 2. 62 1 2 3. 73 0 3 4. 67 -999 1 5. 57 0 1 6. 56 -99 2 7. 57 1 .a 8. 78 -50 1 9. 67 1 .c 10. 92 1 1 11. 57 -50 2 12. 57 0 .c 13. 58 0 3 14. 78 1 1