This learning module illustrates how to reshape data files in SPSS. These examples take wide data files and reshape them into long form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.
Example #1: One variable
Consider the file containing the kids and their heights at one year of age (ht1) and at two years of age (ht2).
get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/kidshtwt.sav'. list famid birth ht1 ht2.
FAMID BIRTH HT1 HT2 1.00 1.00 2.80 3.40 1.00 2.00 2.90 3.80 1.00 3.00 2.20 2.90 2.00 1.00 2.00 3.20 2.00 2.00 1.80 2.80 2.00 3.00 1.90 2.40 3.00 1.00 2.20 3.30 3.00 2.00 2.30 3.40 3.00 3.00 2.10 2.90 Number of cases read: 9 Number of cases listed: 9
This is called a wide format since the heights are contained in different variables (which makes the data set "wide"). We may want the data to be long, where each height is in a separate observation (making the data set "long"), but there is only one variable for height. We can use the SPSS command varstocases to reshape the data from wide to long format. The /make subcommand is used to create the new variables in the long data set from the old variables in the wide data set. The first variable name given on that subcommand, in this case, ht, is the name of the new variable. You can call this new variable anything that you like. The keyword from is then used, followed by the names of the variables from the wide data set that will make up the new variable in the long data set. The /index subcommand is then used the create the variable age. This subcommand is optional; the reshaping will work without it. The index variable indicates from which variable from the wide data set used on the /make subcommand the observations comes. Hence, when the value of age is one, the value of ht was taken from ht1. When the value of age is two, the value of ht was taken from ht2.
varstocases /make ht from ht1 ht2 /index = age. list famid age birth ht.
FAMID BIRTH AGE HT 1.00 1.00 1.00 2.80 1.00 1.00 2.00 3.40 1.00 2.00 1.00 2.90 1.00 2.00 2.00 3.80 1.00 3.00 1.00 2.20 1.00 3.00 2.00 2.90 2.00 1.00 1.00 2.00 2.00 1.00 2.00 3.20 2.00 2.00 1.00 1.80 2.00 2.00 2.00 2.80 2.00 3.00 1.00 1.90 2.00 3.00 2.00 2.40 3.00 1.00 1.00 2.20 3.00 1.00 2.00 3.30 3.00 2.00 1.00 2.30 3.00 2.00 2.00 3.40 3.00 3.00 1.00 2.10 3.00 3.00 2.00 2.90 Number of cases read: 18 Number of cases listed: 18
Example #2: Two variables
Let’s use the same data file, but with all of the variables. In this example, we show how to reshape two variables at a time. Note that you can reshape as many variables as you need by adding a /make subcommand for each set of variables to be reshaped.
get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/kidshtwt.sav'. list.
FAMID BIRTH HT1 HT2 WT1 WT2 1.00 1.00 2.80 3.40 19 28 1.00 2.00 2.90 3.80 21 28 1.00 3.00 2.20 2.90 20 23 2.00 1.00 2.00 3.20 25 30 2.00 2.00 1.80 2.80 20 33 2.00 3.00 1.90 2.40 22 33 3.00 1.00 2.20 3.30 22 28 3.00 2.00 2.30 3.40 20 30 3.00 3.00 2.10 2.90 22 31 Number of cases read: 9 Number of cases listed: 9
varstocases /make ht from ht1 ht2 /make wt from wt1 wt2 /index = age. list.
FAMID BIRTH AGE HT WT 1.00 1.00 1.00 2.80 19.00 1.00 1.00 2.00 3.40 28.00 1.00 2.00 1.00 2.90 21.00 1.00 2.00 2.00 3.80 28.00 1.00 3.00 1.00 2.20 20.00 1.00 3.00 2.00 2.90 23.00 2.00 1.00 1.00 2.00 25.00 2.00 1.00 2.00 3.20 30.00 2.00 2.00 1.00 1.80 20.00 2.00 2.00 2.00 2.80 33.00 2.00 3.00 1.00 1.90 22.00 2.00 3.00 2.00 2.40 33.00 3.00 1.00 1.00 2.20 22.00 3.00 1.00 2.00 3.30 28.00 3.00 2.00 1.00 2.30 20.00 3.00 2.00 2.00 3.40 30.00 3.00 3.00 1.00 2.10 22.00 3.00 3.00 2.00 2.90 31.00 Number of cases read: 18 Number of cases listed: 18
Example #3: Modifying numeric suffixes
This example is like the first example in that we are reshaping only one variable. The difference is that we will use the keyword to on the /make subcommand so that we do not have to list all of the variables in the wide data set that are to be included in the new variable in the long data set. Another difference is that on the /index subcommand we have listed the variable created on the /make subcommand, income, in parentheses. This tells SPSS to put the values of year (96, 97 and 98) into the variable year. If we had written the subcommand as /index year the values given in the variable year would have been 1, 2 and 3.
get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/faminc.sav'. list.
FAMID FAMINC96 FAMINC97 FAMINC98 3.00 75000.00 76000.00 77000.00 1.00 40000.00 40500.00 41000.00 2.00 45000.00 45400.00 45800.00 Number of cases read: 3 Number of cases listed: 3
varstocases /make income from faminc96 to faminc98 /index year (income). list.
FAMID YEAR INCOME 3.00 96.00 75000.00 3.00 97.00 76000.00 3.00 98.00 77000.00 1.00 96.00 40000.00 1.00 97.00 40500.00 1.00 98.00 41000.00 2.00 96.00 45000.00 2.00 97.00 45400.00 2.00 98.00 45800.00 Number of cases read: 9 Number of cases listed: 9
Example #4: String variables and character suffixes
It also is possible to reshape a wide data file to be long when there are character suffixes on the names of the variables in the wide data set. Look at the dmorder file below. Note that we want our long data set to contain a new string variable called name.
get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dmorder.sav'. list.
FAMID NAMED NAMEM INCD INCM 1.00 Bill Bess 30000.00 15000.00 2.00 Art Amy 22000.00 18000.00 3.00 Paul Pat 25000.00 50000.00 Number of cases read: 3 Number of cases listed: 3
varstocases /make name from named namem /make income from incd incm /index dadmom. list.
FAMID DADMOM NAME INCOME 1.00 1 Bill 30000.00 1.00 2 Bess 15000.00 2.00 1 Art 22000.00 2.00 2 Amy 18000.00 3.00 1 Paul 25000.00 3.00 2 Pat 50000.00 Number of cases read: 6 Number of cases listed: 6
Example #5: Non-contiguous variables
The only time that variables being non-contiguous is a problem is when you want to use the keyword to to specify a range of variables. If the variables in the wide data set are non-contiguous, you have two options when reshaping to long. One option is just to list all of the variables on the /make subcommand. The other option is to rearrange the variables in the wide data set and then use the keyword to on the /make subcommand. To rearrange the variables in the wide data set, you can use the save command with the /keep subcommand, as shown below. The variables will be in the data set in the order in which you list them on the /keep subcommand.
get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dadmomw.sav'. list.
FAMID NAMED INCD NAMEM INCM 1.00 Bill 30000.00 Bess 15000.00 2.00 Art 22000.00 Amy 18000.00 3.00 Paul 25000.00 Pat 50000.00 Number of cases read: 3 Number of cases listed: 3
save outfile = "c:dmorder.sav" /keep=famid named namem incd incm. execute. get file 'c:dmorder.sav'. list.
FAMID NAMED NAMEM INCD INCM 1.00 Bill Bess 30000.00 15000.00 2.00 Art Amy 22000.00 18000.00 3.00 Paul Pat 25000.00 50000.00 Number of cases read: 3 Number of cases listed: 3
varstocases /make name from named namem /make income from incd incm /index dadmom. list.
FAMID DADMOM NAME INC 1.00 1.00 Bill 30000.00 1.00 2.00 Bess 15000.00 2.00 1.00 Art 22000.00 2.00 2.00 Amy 18000.00 3.00 1.00 Paul 25000.00 3.00 2.00 Pat 50000.00 Number of cases read: 6 Number of cases listed: 6
Example #6: Creating an ID variable
Sometimes you want to know from which row of the wide data set the values in the long data set came. You can create a variable in the long data set to tell you this by using the /id subcommand, as shown below. You need to provide a name for the new variable on the /id subcommand after the = sign. In our example, we called the new variable id1. Notice the difference between the /index and /id subcommands. The /index subcommand creates a variable that tells you from what variable the values of the reshaped variable are obtained, whereas the /id subcommand creates a variable that tells you from what row of data in the wide data set the observation is obtained.
get file "c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dadmomw.sav". varstocases /make name from named namem /make income from incd incm /index dadmom /id=id1. list.
ID1 FAMID DADMOM NAME INCOME 1 1.00 1 Bill 30000.00 1 1.00 2 Bess 15000.00 2 2.00 1 Art 22000.00 2 2.00 2 Amy 18000.00 3 3.00 1 Paul 25000.00 3 3.00 2 Pat 50000.00 Number of cases read: 6 Number of cases listed: 6