This module illustrates how to reshape data files in SPSS without using the casestovars command. These examples take long data files and reshape them into wide form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.
Example #1: One variable
Let’s use the kidslw file for our first example.
get file 'c:kidslw.sav'. list famid birth age. FAMID BIRTH AGE 1.00 1.00 9.00 1.00 2.00 6.00 1.00 3.00 3.00 2.00 1.00 8.00 2.00 2.00 6.00 2.00 3.00 2.00 3.00 1.00 6.00 3.00 2.00 4.00 3.00 3.00 2.00 Number of cases read: 9 Number of cases listed: 9
Let’s make age in this file wide, making one record per family which would contain age1, age2 and age3, the ages of the kids in the family (age2 would be missing if there is only one kid, and age3 would be missing if there are only two kids). As our first step, we will create a vector with three elements, calling it age. Next, we will create a variable also called age and insert the values from the vector. We then list the data in the file to show how they have changed. In this and all of the following examples, we will list the data at intermediate steps of the process of making the file wide. This is useful in case the process does not go correctly, because it is easier to see where the error occurred.
vector age(3). compute age(birth) = age. list id famid age age1 age2 age3. ID FAMID AGE AGE1 AGE2 AGE3 1.00 1.00 9.00 9.00 . . 2.00 1.00 6.00 . 6.00 . 3.00 1.00 3.00 . . 3.00 4.00 2.00 8.00 8.00 . . 5.00 2.00 6.00 . 6.00 . 6.00 2.00 2.00 . . 2.00 7.00 3.00 6.00 6.00 . . 8.00 3.00 4.00 . 4.00 . 9.00 3.00 2.00 . . 2.00 Number of cases read: 9 Number of cases listed: 9
The next step is to aggregate the data. We use the max function to capture the largest value in the variables age1 age2 and age3. Note that while we use the max function, you could also use the min function. Both functions work because SPSS does not consider a missing value to be either infinitely large nor infinitely small (as SAS and Stata do). Note that the variables not directly referenced in the aggregate command are dropped (i.e., all variables aside from famid age1, age2 and age3 are dropped).
aggregate outfile 'c:kids1.sav' /break famid /age1 to age3 = max(age1 to age3). get file 'c:kids1.sav'. list. FAMID AGE1 AGE2 AGE3 1.00 9.00 6.00 3.00 2.00 8.00 6.00 2.00 3.00 6.00 4.00 2.00 Number of cases read: 3 Number of cases listed: 3
Example #2: Two variables
In the example above, we reshaped only one variable, age. In the example below, we reshape the variables age and wt. Note that for each variable to be reshaped, you need to add a vector and a compute command. You also need to get the max (or min) of the variable in the aggregate command. Hence, you can reshape as many variables as needed.
get file 'c:kidslw.sav'. list famid birth age. ID FAMID BIRTH AGE 1.00 1.00 1.00 9.00 2.00 1.00 2.00 6.00 3.00 1.00 3.00 3.00 4.00 2.00 1.00 8.00 5.00 2.00 2.00 6.00 6.00 2.00 3.00 2.00 7.00 3.00 1.00 6.00 8.00 3.00 2.00 4.00 9.00 3.00 3.00 2.00 Number of cases read: 9 Number of cases listed: 9 vector age(3). vector wt(3). compute age(birth) = age. compute wt(birth) = wt. list. The variables are listed in the following order: LINE 1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1 LINE 2: AGE2 AGE3 WT1 WT2 WT3 ID: 1.00 1.00 Beth 1.00 9.00 60.00 f 9.00 AGE2: . . 60.00 . . ID: 2.00 1.00 Bob 2.00 6.00 40.00 m . AGE2: 6.00 . . 40.00 . ID: 3.00 1.00 Barb 3.00 3.00 20.00 f . AGE2: . 3.00 . . 20.00 ID: 4.00 2.00 Andy 1.00 8.00 80.00 m 8.00 AGE2: . . 80.00 . . ID: 5.00 2.00 Al 2.00 6.00 50.00 m . AGE2: 6.00 . . 50.00 . ID: 6.00 2.00 Ann 3.00 2.00 20.00 f . AGE2: . 2.00 . . 20.00 ID: 7.00 3.00 Pete 1.00 6.00 60.00 m 6.00 AGE2: . . 60.00 . . ID: 8.00 3.00 Pam 2.00 4.00 40.00 f . AGE2: 4.00 . . 40.00 . ID: 9.00 3.00 Phil 3.00 2.00 20.00 m . AGE2: . 2.00 . . 20.00 Number of cases read: 9 Number of cases listed: 9 aggregate outfile 'c:kids2.sav' /break famid /age1 to age3 = max(age1 to age3) /wt1 to wt3 = max(wt1 to wt3). get file 'c:kids2.sav'. list. FAMID AGE1 AGE2 AGE3 WT1 WT2 WT3 1.00 9.00 6.00 3.00 60.00 40.00 20.00 2.00 8.00 6.00 2.00 80.00 50.00 20.00 3.00 6.00 4.00 2.00 60.00 40.00 20.00 Number of cases read: 3 Number of cases listed: 3
Example #3: Both numeric and character variables
The examples above showed how to reshape data using numeric variables, but sometimes you will need to reshape string variables as well. In this example, we reshape two numeric variables, age and wt, and two string variables, sex and kidname. We need to create the two string variables before creating the vectors, and we use the string command to do this. Note that while you can use the compute command to create a numeric variable, you can only use the compute command to modify a string variable.
Consider the kidslw data file shown below.
get file 'c:kidslw.sav'. list. ID FAMID KIDNAME BIRTH AGE WT SEX 1.00 1.00 Beth 1.00 9.00 60.00 f 2.00 1.00 Bob 2.00 6.00 40.00 m 3.00 1.00 Barb 3.00 3.00 20.00 f 4.00 2.00 Andy 1.00 8.00 80.00 m 5.00 2.00 Al 2.00 6.00 50.00 m 6.00 2.00 Ann 3.00 2.00 20.00 f 7.00 3.00 Pete 1.00 6.00 60.00 m 8.00 3.00 Pam 2.00 4.00 40.00 f 9.00 3.00 Phil 3.00 2.00 20.00 m Number of cases read: 9 Number of cases listed: 9 vector age(3). vector wt(3). string sex1 to sex3 (A4). vector sex= sex1 to sex3. string kidname1 to kidname3 (A4). vector kidname= kidname1 to kidname3. compute kidname(birth) = kidname. compute wt(birth) = wt. compute sex(birth) = sex. compute age(birth) = age. list. The variables are listed in the following order: LINE 1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1 LINE 2: AGE2 AGE3 WT1 WT2 WT3 SEX1 SEX2 SEX3 KIDNAME1 KIDNAME2 LINE 3: KIDNAME3 ID: 1.00 1.00 Beth 1.00 9.00 60.00 f 9.00 AGE2: . . 60.00 . . f Beth KIDNAME3: ID: 2.00 1.00 Bob 2.00 6.00 40.00 m . AGE2: 6.00 . . 40.00 . m Bob KIDNAME3: ID: 3.00 1.00 Barb 3.00 3.00 20.00 f . AGE2: . 3.00 . . 20.00 f KIDNAME3: Barb ID: 4.00 2.00 Andy 1.00 8.00 80.00 m 8.00 AGE2: . . 80.00 . . m Andy KIDNAME3: ID: 5.00 2.00 Al 2.00 6.00 50.00 m . AGE2: 6.00 . . 50.00 . m Al KIDNAME3: ID: 6.00 2.00 Ann 3.00 2.00 20.00 f . AGE2: . 2.00 . . 20.00 f KIDNAME3: Ann ID: 7.00 3.00 Pete 1.00 6.00 60.00 m 6.00 AGE2: . . 60.00 . . m Pete KIDNAME3: ID: 8.00 3.00 Pam 2.00 4.00 40.00 f . AGE2: 4.00 . . 40.00 . f Pam KIDNAME3: ID: 9.00 3.00 Phil 3.00 2.00 20.00 m . AGE2: . 2.00 . . 20.00 m KIDNAME3: Phil Number of cases read: 9 Number of cases listed: 9 aggregate outfile 'c:kids3.sav' /break famid /age1 to age3 = max(age1 to age3) /wt1 to wt3 = max(wt1 to wt3) /sex1 to sex3 = max(sex1 to sex3) /kidname1 to kidname3 = max(kidname1 to kidname3). get file 'c:kids3.sav'. list. The variables are listed in the following order: LINE 1: FAMID AGE1 AGE2 AGE3 WT1 WT2 WT3 SEX1 LINE 2: SEX2 SEX3 KIDNAME1 KIDNAME2 KIDNAME3 FAMID: 1.00 9.00 6.00 3.00 60.00 40.00 20.00 f SEX2: m f Beth Bob Barb FAMID: 2.00 8.00 6.00 2.00 80.00 50.00 20.00 m SEX2: m f Andy Al Ann FAMID: 3.00 6.00 4.00 2.00 60.00 40.00 20.00 m SEX2: f m Pete Pam Phil Number of cases read: 3 Number of cases listed: 3
Example #4: Character suffixes
You cannot use character suffixes in SPSS when reshaping data because you cannot aggregate the data to form the wide data file. Hence, in this example, we use a numeric suffix while reshaping and rename the variables with the character suffix as the last step.
get file 'c:dadmoml.sav'. list.ID FAMID NAME INC DADMOM 1.00 2.00 Art 22000.00 dad 2.00 1.00 Bill 30000.00 dad 3.00 3.00 Paul 25000.00 dad 4.00 1.00 Bess 15000.00 mom 5.00 3.00 Pat 50000.00 mom 6.00 2.00 Amy 18000.00 mom Number of cases read: 6 Number of cases listed: 6if dadmom = "mom" dadmomN = 1. if dadmom = "dad" dadmomN = 2. list.ID FAMID NAME INC DADMOM DADMOMN 1.00 2.00 Art 22000.00 dad 2.00 2.00 1.00 Bill 30000.00 dad 2.00 3.00 3.00 Paul 25000.00 dad 2.00 4.00 1.00 Bess 15000.00 mom 1.00 5.00 3.00 Pat 50000.00 mom 1.00 6.00 2.00 Amy 18000.00 mom 1.00 Number of cases read: 6 Number of cases listed: 6string name1 to name2 (A4). vector name = name1 to name2. compute name(dadmomN) = name. vector inc(2). compute inc(dadmomN) = inc. list.ID FAMID NAME INC DADMOM DADMOMN NAME1 NAME2 INC1 INC2 1.00 2.00 Art 22000.00 dad 2.00 Art . 22000.00 2.00 1.00 Bill 30000.00 dad 2.00 Bill . 30000.00 3.00 3.00 Paul 25000.00 dad 2.00 Paul . 25000.00 4.00 1.00 Bess 15000.00 mom 1.00 Bess 15000.00 . 5.00 3.00 Pat 50000.00 mom 1.00 Pat 50000.00 . 6.00 2.00 Amy 18000.00 mom 1.00 Amy 18000.00 . Number of cases read: 6 Number of cases listed: 6aggregate outfile 'c:kids4.sav' /break famid /inc1 to inc2 = max(inc1 to inc2) /name1 to name2 = max(name1 to name2). get file 'c:kids4.sav'. list.FAMID INC1 INC2 NAME1 NAME2 1.00 15000.00 30000.00 Bess Bill 2.00 18000.00 22000.00 Amy Art 3.00 50000.00 25000.00 Pat Paulrename variables (inc1 to inc2 = incmom incdad). rename variables (name1 to name2 = namemom namedad). list. Number of cases read: 3 Number of cases listed: 3FAMID INCMOM INCDAD NAMEMOM NAMEDAD 1.00 15000.00 30000.00 Bess Bill 2.00 18000.00 22000.00 Amy Art 3.00 50000.00 25000.00 Pat Paul Number of cases read: 3 Number of cases listed: 3
Example #5: No index variable
There may be times when your data are ordered but an index variable is not present. In this situation, you will need to create an index variable before you can reshape the data. This example illustrates how to create an index variable and then use that index variable in the reshaping. We will use the data from Example 1. In that example, birth is the index variable. Hence, we will create an index variable which will have the same values as the variable birth. We will call our index variable index.
get file 'c:kidslw.sav'. list famid birth age.FAMID BIRTH AGE 1.00 1.00 9.00 1.00 2.00 6.00 1.00 3.00 3.00 2.00 1.00 8.00 2.00 2.00 6.00 2.00 3.00 2.00 3.00 1.00 6.00 3.00 2.00 4.00 3.00 3.00 2.00 Number of cases read: 9 Number of cases listed: 9compute index = 1. if famid = lag(famid) index = lag(index) + 1. vector age(3). compute age(index) = age. list id famid age age1 age2 age3 birth index.ID FAMID AGE AGE1 AGE2 AGE3 BIRTH INDEX 1.00 1.00 9.00 9.00 . . 1.00 1.00 2.00 1.00 6.00 . 6.00 . 2.00 2.00 3.00 1.00 3.00 . . 3.00 3.00 3.00 4.00 2.00 8.00 8.00 . . 1.00 1.00 5.00 2.00 6.00 . 6.00 . 2.00 2.00 6.00 2.00 2.00 . . 2.00 3.00 3.00 7.00 3.00 6.00 6.00 . . 1.00 1.00 8.00 3.00 4.00 . 4.00 . 2.00 2.00 9.00 3.00 2.00 . . 2.00 3.00 3.00 Number of cases read: 9 Number of cases listed: 9aggregate outfile 'c:kids5.sav' /break famid /age1 to age3 = max(age1 to age3). get file 'c:kids5.sav'. list.FAMID AGE1 AGE2 AGE3 1.00 9.00 6.00 3.00 2.00 8.00 6.00 2.00 3.00 6.00 4.00 2.00 Number of cases read: 3 Number of cases listed: 3