Suppose that you have a group of people. Each person is given his or her own unique id number. Next, one by one, you ask each person to pick a partner. For each person, you write down his or her id number in one column, and the id number of the partner in a second column. So, the first person picks a partner, and then both people rejoin the group. Then the next person selects a partner. Some people will select the person who had selected them as a partner, while others will select a new person to be their partner. A small, example data set might look like the one below.

110 210 514 856 210 110 210 111 693 246

Now suppose that you want to find all of the unique pairs of people. You will notice that the third case contains the same two person id numbers as the first case. Hence, the third case should be flagged as a duplicate. How can we do this in SPSS?

First, we are going to input the data as string variables (**sid1** and **
sid2**), and then make numeric copies of them (**nid1** and **nid2**).
(Of course, you could input the data as numeric variables and then make string
copies of them.) Next, we are going to concatenate the string variables in
two ways. Because we are concatenating the variables, they need to be
strings. First, we will concatenate them such that the smallest id number
is first. Next, we will concatenate them such that the largest id number
is first. We need the numeric version of the variables to determine which
is the smallest and which is the largest of the two variables. Once we
have done this, we can sort the new variable created by the concatenation and
create the flag variable. Below is the syntax for these steps.

data list list / sid1 (A3) sid2 (A3). begin data. 110 210 514 856 210 110 210 111 693 246 end data. recode sid1 (convert) into nid1. recode sid2 (convert) into nid2. string pairid (A6). if (nid1 lt nid2) pairid = concat(sid1, sid2). if (nid1 gt nid2) pairid = concat(sid2, sid1). sort cases by pairid. compute flag = 0. if pairid = lag(pairid) flag = 1. exe.

The final data set is shown below.

Once you have identified the duplicates, you can easily filter or delete them from your data set.