You can compare SPSS datasets using the compare datasets command. This command was introduced in SPSS version 21. Let’s see some examples that use this command.
Example 1: Two raters
Let us suppose that two raters have entered data into different SPSS data sets. We want to compare these two datasets to see if the values for all of the variables are the same. First, we will enter the data for the two raters, and then we will use the compare datasets command to compare each variable in the two datasets.
data list list /id test1 test2. begin data. 1 11 80 2 55 88 3 44 77 4 66 33 end data. dataset name rater1. data list list /id test1 test2. begin data. 1 12 80 2 55 88 3 44 78 4 66 33 end data. dataset name rater2. dataset activate rater1.
The dataset listed on the compdataset subcommand will be compared to the active dataset.
compare datasets /compdataset rater2 /variables all.
Example 2: String and numeric variables
The example below illustrates that the variables to be compared must be of the same type, either string or numeric. A string variable cannot be compared to a numeric variable, even if the string variable contains numbers.
data list list /id * test1 (A2) test2. begin data. 1 11 80 2 55 88 3 44 77 4 66 33 end data. dataset name rater3. data list list /id test1 test2. begin data. 1 11 80 2 55 88 3 44 78 4 66 33 5 77 22 end data. dataset name rater4. dataset activate rater3. compare datasets /compdataset rater4 /variables all /output varproperties = all.
Example 3: User-defined missing values
You can define a specific value, such as 88, as missing. (This is called a user-defined missing value in SPSS.) If this is done in one dataset, but not the other, such values will be labeled as a mismatch when they are encountered in the datasets, even if they are in the same variable and case number.
In this example, we also show that the length of a string variable or the format of a numeric variable does not hinder the matching process. In the first dataset (named rater5), the variable test1 is a string variable that has a length of 2, and the numeric variable test2 has a format of f2.0 (which means that it has a length of 2 and no decimals). In the second dataset (called rater6), the string variable test1 has a length of 3, and the numeric variable test2 has a length of 3 and one place for the decimal value.
data list list /id * test1 (a2) test2 (f2.0). begin data. 1 11 80 2 55 88 3 44 77 4 66 33 end data. missing values test2 (88). dataset name rater5. data list list /id * test1 (a3) test2 (f3.1). begin data. 1 11 80 2 55 88 3 44 78 4 66 33 5 77 22 end data. dataset name rater6. dataset activate rater5. compare datasets /compdataset rater6 /variables all /output varproperties = all.
If you need to update one dataset with the values in another dataset, you may want to use the update command. Please see our SPSS FAQ: How can I compare two data sets in SPSS? or How do I check that the same data input by two people are consistently entered?