This module illustrates how to reshape data files in SPSS without using the varstocases command. These examples take wide data files and reshape them into long form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.
Example #1: One variable
Consider the file containing the kids and their heights at one year of age (ht1) and at two years of age (ht2).
get file 'c:kidshtwt.sav'. list famid birth ht1 ht2.FAMID BIRTH HT1 HT2 1.00 1.00 2.80 3.40 1.00 2.00 2.90 3.80 1.00 3.00 2.20 2.90 2.00 1.00 2.00 3.20 2.00 2.00 1.80 2.80 2.00 3.00 1.90 2.40 3.00 1.00 2.20 3.30 3.00 2.00 2.30 3.40 3.00 3.00 2.10 2.90 Number of cases read: 9 Number of cases listed: 9
This is called a wide format since the heights are wide. We may want the data to be long, where each height is in a separate observation. First, we create a vector of the variable to be reshaped. Then we create a loop using the index variable and compute a new variable using the vector, looping as many times as needed. We save these changes to a new data file and end the loop. Finally, we get the new data file and list it to make sure that all went as planned. Note that we use xsave here instead of save. This is because xsave is not executed until data are read for the next procedure. Hence, it reduces processing time by consolidating two data passes into one.
vector Aht = ht1 to ht2. loop age = 1 to 2. compute ht = Aht(age). xsave outfile 'c:longex1.sav' /drop ht1 ht2 wt1 wt2. end loop. execute. get file 'c:longex1.sav'. list.FAMID BIRTH AGE HT 1.00 1.00 1.00 2.80 1.00 1.00 2.00 3.40 1.00 2.00 1.00 2.90 1.00 2.00 2.00 3.80 1.00 3.00 1.00 2.20 1.00 3.00 2.00 2.90 2.00 1.00 1.00 2.00 2.00 1.00 2.00 3.20 2.00 2.00 1.00 1.80 2.00 2.00 2.00 2.80 2.00 3.00 1.00 1.90 2.00 3.00 2.00 2.40 3.00 1.00 1.00 2.20 3.00 1.00 2.00 3.30 3.00 2.00 1.00 2.30 3.00 2.00 2.00 3.40 3.00 3.00 1.00 2.10 3.00 3.00 2.00 2.90 Number of cases read: 18 Number of cases listed: 18
Example #2: Two variables
Let’s use the same data file, but with all of the variables. In this example, we show how to reshape two variables at a time. Note that you can reshape as many variables as you need by adding a vector and a compute command for each variable to be reshaped. You may also want to add those variables to the /drop subcommand in the aggregate command.
get file 'c:kidshtwt.sav'. list.FAMID BIRTH HT1 HT2 WT1 WT2 1.00 1.00 2.80 3.40 19 28 1.00 2.00 2.90 3.80 21 28 1.00 3.00 2.20 2.90 20 23 2.00 1.00 2.00 3.20 25 30 2.00 2.00 1.80 2.80 20 33 2.00 3.00 1.90 2.40 22 33 3.00 1.00 2.20 3.30 22 28 3.00 2.00 2.30 3.40 20 30 3.00 3.00 2.10 2.90 22 31 Number of cases read: 9 Number of cases listed: 9vector Aht = ht1 to ht2. vector Awt = wt1 to wt2. loop age = 1 to 2. compute ht = Aht(age). compute wt = Awt(age). xsave outfile 'c:longex2.sav' /drop ht1 ht2 wt1 wt2. end loop. execute. get file 'c:longex2.sav'. list.FAMID BIRTH AGE HT WT 1.00 1.00 1.00 2.80 19.00 1.00 1.00 2.00 3.40 28.00 1.00 2.00 1.00 2.90 21.00 1.00 2.00 2.00 3.80 28.00 1.00 3.00 1.00 2.20 20.00 1.00 3.00 2.00 2.90 23.00 2.00 1.00 1.00 2.00 25.00 2.00 1.00 2.00 3.20 30.00 2.00 2.00 1.00 1.80 20.00 2.00 2.00 2.00 2.80 33.00 2.00 3.00 1.00 1.90 22.00 2.00 3.00 2.00 2.40 33.00 3.00 1.00 1.00 2.20 22.00 3.00 1.00 2.00 3.30 28.00 3.00 2.00 1.00 2.30 20.00 3.00 2.00 2.00 3.40 30.00 3.00 3.00 1.00 2.10 22.00 3.00 3.00 2.00 2.90 31.00 Number of cases read: 18 Number of cases listed: 18
Example #3: Modifying numeric suffixes
This example is like the first example in that we are reshaping only one variable. However, in this example we don’t really want the loop to start from one as we have done before, but rather start at 96 and go to 98. An extra step is necessary to accomplish this, because in SPSS the loop has to start with one. Therefore, we add 95 to each value of year.
get file 'c:faminc.sav'. list.FAMID FAMINC96 FAMINC97 FAMINC98 3.00 75000.00 76000.00 77000.00 1.00 40000.00 40500.00 41000.00 2.00 45000.00 45400.00 45800.00 Number of cases read: 3 Number of cases listed: 3vector Ainc=faminc96 to faminc98. loop i = 1 to 3. compute income=Ainc(i). compute year = 95+i. xsave outfile 'c:widefaminc' /keep famid year income. end loop. execute. get file 'c:widefaminc'. list.FAMID YEAR INCOME 3.00 96.00 75000.00 3.00 97.00 76000.00 3.00 98.00 77000.00 1.00 96.00 40000.00 1.00 97.00 40500.00 1.00 98.00 41000.00 2.00 96.00 45000.00 2.00 97.00 45400.00 2.00 98.00 45800.00 Number of cases read: 9 Number of cases listed: 9
Example #4: String variables and character suffixes
It also is possible to reshape a wide data file to be long when there are character suffixes. Look at the dmorder file below. Note that we want our long data set to contain a new string variable called name. To create and/or modify a numeric variable, you could use the compute command. However, you can only MODIFY a string variable with the compute command. To CREATE a string variable, you need to use the string command. The syntax for this command is straight forward: STRING varname (A_), where the _ is the length of the variable. An example of the use of this command is presented below.
get file 'c:dmorder.sav'. list.FAMID NAMED NAMEM INCD INCM 1.00 Bill Bess 30000.00 15000.00 2.00 Art Amy 22000.00 18000.00 3.00 Paul Pat 25000.00 50000.00 Number of cases read: 3 Number of cases listed: 3vector Aname = named to namem. vector Ainc = incd to incm. string name (A4). loop dadmom = 1 to 2. compute name = Aname(dadmom). compute inc = Ainc(dadmom). xsave outfile 'c:dm1.sav' /keep famid dadmom name inc. end loop. execute. get file 'c:dm1.sav'. list.FAMID DADMOM NAME INC 1.00 1.00 Bill 30000.00 1.00 2.00 Bess 15000.00 2.00 1.00 Art 22000.00 2.00 2.00 Amy 18000.00 3.00 1.00 Paul 25000.00 3.00 2.00 Pat 50000.00 Number of cases read: 6 Number of cases listed: 6
Example #5: Non-contiguous variables
SPSS assumes that the variables to be reshaped are contiguous (side-by-side) in your data file. If they are not, you will get an error message when SPSS encounters the vector command. To address this problem, use a save command with the /keep subcommand, listing the variables in the correct order. The data will be saved with the variables in the order that they were listed in the /keep subcommand. Notice in the file shown below that the variables named and namem are not listed next to each other, nor are incd and incm. Also note that if there were variables with numeric suffixes, they would have to be contiguous too. Finally, the variables have to be listed on the vector command in the order that they appear in the data file. This applies to both numeric and string variables.
get file 'c:dadmomw.sav'. list.FAMID NAMED INCD NAMEM INCM 1.00 Bill 30000.00 Bess 15000.00 2.00 Art 22000.00 Amy 18000.00 3.00 Paul 25000.00 Pat 50000.00 Number of cases read: 3 Number of cases listed: 3save outfile = "c:dmorder.sav" /keep=famid named namem incd incm. execute. get file 'c:dmorder.sav'. list.FAMID NAMED NAMEM INCD INCM 1.00 Bill Bess 30000.00 15000.00 2.00 Art Amy 22000.00 18000.00 3.00 Paul Pat 25000.00 50000.00 Number of cases read: 3 Number of cases listed: 3
Now that the variables are in the correct order, we can proceed with the reshaping as before.
vector Aname = named to namem. vector Ainc = incd to incm. string name (A4). loop dadmom = 1 to 2. compute name = Aname(dadmom). compute inc = Ainc(dadmom). xsave outfile 'c:dm.sav' /keep famid dadmom name inc. end loop. execute. get file 'c:dm.sav'. list.FAMID DADMOM NAME INC 1.00 1.00 Bill 30000.00 1.00 2.00 Bess 15000.00 2.00 1.00 Art 22000.00 2.00 2.00 Amy 18000.00 3.00 1.00 Paul 25000.00 3.00 2.00 Pat 50000.00 Number of cases read: 6 Number of cases listed: 6