Reshaping data wide to long in versions 11 and up

This learning module illustrates how to reshape data files in SPSS. These examples take wide data files and reshape them into long form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.

Example #1: One variable

Consider the file containing the kids and their heights at one year of age (ht1) and at two years of age (ht2).

get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/kidshtwt.sav'.
list famid birth ht1 ht2.

    FAMID     BIRTH       HT1       HT2

     1.00      1.00      2.80      3.40
     1.00      2.00      2.90      3.80
     1.00      3.00      2.20      2.90
     2.00      1.00      2.00      3.20
     2.00      2.00      1.80      2.80
     2.00      3.00      1.90      2.40
     3.00      1.00      2.20      3.30
     3.00      2.00      2.30      3.40
     3.00      3.00      2.10      2.90

 Number of cases read:  9    Number of cases listed:  9

This is called a wide format since the heights are contained in different variables (which makes the data set "wide"). We may want the data to be long, where each height is in a separate observation (making the data set "long"), but there is only one variable for height. We can use the SPSS command varstocases to reshape the data from wide to long format. The /make subcommand is used to create the new variables in the long data set from the old variables in the wide data set. The first variable name given on that subcommand, in this case, ht, is the name of the new variable. You can call this new variable anything that you like. The keyword from is then used, followed by the names of the variables from the wide data set that will make up the new variable in the long data set. The /index subcommand is then used the create the variable age. This subcommand is optional; the reshaping will work without it. The index variable indicates from which variable from the wide data set used on the /make subcommand the observations comes. Hence, when the value of age is one, the value of ht was taken from ht1. When the value of age is two, the value of ht was taken from ht2.

varstocases
 /make ht from ht1 ht2
 /index = age.
list famid age birth ht.

    FAMID     BIRTH      AGE       HT

     1.00      1.00       1.00     2.80
     1.00      1.00       2.00     3.40
     1.00      2.00       1.00     2.90
     1.00      2.00       2.00     3.80
     1.00      3.00       1.00     2.20
     1.00      3.00       2.00     2.90
     2.00      1.00       1.00     2.00
     2.00      1.00       2.00     3.20
     2.00      2.00       1.00     1.80
     2.00      2.00       2.00     2.80
     2.00      3.00       1.00     1.90
     2.00      3.00       2.00     2.40
     3.00      1.00       1.00     2.20
     3.00      1.00       2.00     3.30
     3.00      2.00       1.00     2.30
     3.00      2.00       2.00     3.40
     3.00      3.00       1.00     2.10
     3.00      3.00       2.00     2.90

Number of cases read:  18    Number of cases listed:  18

Example #2: Two variables

Let’s use the same data file, but with all of the variables. In this example, we show how to reshape two variables at a time. Note that you can reshape as many variables as you need by adding a /make subcommand for each set of variables to be reshaped.

get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/kidshtwt.sav'.
list.

    FAMID     BIRTH       HT1       HT2      WT1      WT2

     1.00      1.00      2.80      3.40       19       28
     1.00      2.00      2.90      3.80       21       28
     1.00      3.00      2.20      2.90       20       23
     2.00      1.00      2.00      3.20       25       30
     2.00      2.00      1.80      2.80       20       33
     2.00      3.00      1.90      2.40       22       33
     3.00      1.00      2.20      3.30       22       28
     3.00      2.00      2.30      3.40       20       30
     3.00      3.00      2.10      2.90       22       31

Number of cases read:  9    Number of cases listed:  9

varstocases
 /make ht from ht1 ht2
 /make wt from wt1 wt2
 /index = age.
list.

    FAMID     BIRTH      AGE       HT       WT

     1.00      1.00     1.00     2.80    19.00
     1.00      1.00     2.00     3.40    28.00
     1.00      2.00     1.00     2.90    21.00
     1.00      2.00     2.00     3.80    28.00
     1.00      3.00     1.00     2.20    20.00
     1.00      3.00     2.00     2.90    23.00
     2.00      1.00     1.00     2.00    25.00
     2.00      1.00     2.00     3.20    30.00
     2.00      2.00     1.00     1.80    20.00
     2.00      2.00     2.00     2.80    33.00
     2.00      3.00     1.00     1.90    22.00
     2.00      3.00     2.00     2.40    33.00
     3.00      1.00     1.00     2.20    22.00
     3.00      1.00     2.00     3.30    28.00
     3.00      2.00     1.00     2.30    20.00
     3.00      2.00     2.00     3.40    30.00
     3.00      3.00     1.00     2.10    22.00
     3.00      3.00     2.00     2.90    31.00

Number of cases read:  18    Number of cases listed:  18

Example #3: Modifying numeric suffixes

This example is like the first example in that we are reshaping only one variable. The difference is that we will use the keyword to on the /make subcommand so that we do not have to list all of the variables in the wide data set that are to be included in the new variable in the long data set. Another difference is that on the /index subcommand we have listed the variable created on the /make subcommand, income, in parentheses. This tells SPSS to put the values of year (96, 97 and 98) into the variable year. If we had written the subcommand as /index year the values given in the variable year would have been 1, 2 and 3.

get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/faminc.sav'.
list.

    FAMID  FAMINC96  FAMINC97  FAMINC98

     3.00  75000.00  76000.00  77000.00
     1.00  40000.00  40500.00  41000.00
     2.00  45000.00  45400.00  45800.00

Number of cases read:  3    Number of cases listed:  3

varstocases
 /make income from faminc96 to faminc98
 /index year (income).
list.

    FAMID     YEAR   INCOME

     3.00    96.00 75000.00
     3.00    97.00 76000.00
     3.00    98.00 77000.00
     1.00    96.00 40000.00
     1.00    97.00 40500.00
     1.00    98.00 41000.00
     2.00    96.00 45000.00
     2.00    97.00 45400.00
     2.00    98.00 45800.00

Number of cases read:  9    Number of cases listed:  9

Example #4: String variables and character suffixes

It also is possible to reshape a wide data file to be long when there are character suffixes on the names of the variables in the wide data set. Look at the dmorder file below. Note that we want our long data set to contain a new string variable called name.

get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dmorder.sav'.
list.

    FAMID NAMED NAMEM      INCD      INCM

     1.00 Bill  Bess   30000.00  15000.00
     2.00 Art   Amy    22000.00  18000.00
     3.00 Paul  Pat    25000.00  50000.00

Number of cases read:  3    Number of cases listed:  3

varstocases 
 /make name from named namem
 /make income from incd incm
 /index dadmom.
list.

    FAMID DADMOM NAME    INCOME

     1.00     1  Bill  30000.00
     1.00     2  Bess  15000.00
     2.00     1  Art   22000.00
     2.00     2  Amy   18000.00
     3.00     1  Paul  25000.00
     3.00     2  Pat   50000.00

Number of cases read:  6    Number of cases listed:  6

Example #5: Non-contiguous variables

The only time that variables being non-contiguous is a problem is when you want to use the keyword to to specify a range of variables. If the variables in the wide data set are non-contiguous, you have two options when reshaping to long. One option is just to list all of the variables on the /make subcommand. The other option is to rearrange the variables in the wide data set and then use the keyword to on the /make subcommand. To rearrange the variables in the wide data set, you can use the save command with the /keep subcommand, as shown below. The variables will be in the data set in the order in which you list them on the /keep subcommand.

get file 'c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dadmomw.sav'.
list.

    FAMID NAMED      INCD NAMEM      INCM

     1.00 Bill   30000.00 Bess   15000.00
     2.00 Art    22000.00 Amy    18000.00
     3.00 Paul   25000.00 Pat    50000.00

Number of cases read:  3    Number of cases listed:  3

save outfile = "c:dmorder.sav" 
 /keep=famid named namem incd incm.
execute.
get file 'c:dmorder.sav'.
list.

    FAMID NAMED NAMEM      INCD      INCM

     1.00 Bill  Bess   30000.00  15000.00
     2.00 Art   Amy    22000.00  18000.00
     3.00 Paul  Pat    25000.00  50000.00

Number of cases read:  3    Number of cases listed:  3

varstocases 
 /make name from named namem
 /make income from incd incm
 /index dadmom.
list.

    FAMID   DADMOM NAME      INC

     1.00     1.00 Bill 30000.00
     1.00     2.00 Bess 15000.00
     2.00     1.00 Art  22000.00
     2.00     2.00 Amy  18000.00
     3.00     1.00 Paul 25000.00
     3.00     2.00 Pat  50000.00

Number of cases read:  6    Number of cases listed:  6

Example #6: Creating an ID variable

Sometimes you want to know from which row of the wide data set the values in the long data set came. You can create a variable in the long data set to tell you this by using the /id subcommand, as shown below. You need to provide a name for the new variable on the /id subcommand after the = sign. In our example, we called the new variable id1. Notice the difference between the /index and /id subcommands. The /index subcommand creates a variable that tells you from what variable the values of the reshaped variable are obtained, whereas the /id subcommand creates a variable that tells you from what row of data in the wide data set the observation is obtained.

get file "c:https://stats.idre.ucla.edu/wp-content/uploads/2016/02/dadmomw.sav".

varstocases 
 /make name from named namem
 /make income from incd incm
 /index dadmom
 /id=id1.
list.

     ID1     FAMID DADMOM NAME    INCOME

       1      1.00     1  Bill  30000.00
       1      1.00     2  Bess  15000.00
       2      2.00     1  Art   22000.00
       2      2.00     2  Amy   18000.00
       3      3.00     1  Paul  25000.00
       3      3.00     2  Pat   50000.00


Number of cases read:  6    Number of cases listed:  6