Reshaping data long to wide | SPSS Code Fragments

This module illustrates how to reshape data files in SPSS without using the casestovars command. These examples take long data files and reshape them into wide form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.

Example #1: One variable

Let’s use the kidslw file for our first example.

get file 'c:kidslw.sav'.
list famid birth age.

    FAMID     BIRTH       AGE

     1.00      1.00      9.00
     1.00      2.00      6.00
     1.00      3.00      3.00
     2.00      1.00      8.00
     2.00      2.00      6.00
     2.00      3.00      2.00
     3.00      1.00      6.00
     3.00      2.00      4.00
     3.00      3.00      2.00

Number of cases read:  9    Number of cases listed:  9

Let’s make age in this file wide, making one record per family which would contain age1, age2 and age3, the ages of the kids in the family (age2 would be missing if there is only one kid, and age3 would be missing if there are only two kids). As our first step, we will create a vector with three elements, calling it age. Next, we will create a variable also called age and insert the values from the vector. We then list the data in the file to show how they have changed. In this and all of the following examples, we will list the data at intermediate steps of the process of making the file wide. This is useful in case the process does not go correctly, because it is easier to see where the error occurred.

vector age(3).
compute age(birth) = age.
list id famid age age1 age2 age3. 

       ID     FAMID       AGE     AGE1     AGE2     AGE3

     1.00      1.00      9.00     9.00      .        .
     2.00      1.00      6.00      .       6.00      .
     3.00      1.00      3.00      .        .       3.00
     4.00      2.00      8.00     8.00      .        .
     5.00      2.00      6.00      .       6.00      .
     6.00      2.00      2.00      .        .       2.00
     7.00      3.00      6.00     6.00      .        .
     8.00      3.00      4.00      .       4.00      .
     9.00      3.00      2.00      .        .       2.00

Number of cases read:  9    Number of cases listed:  9

The next step is to aggregate the data. We use the max function to capture the largest value in the variables age1 age2 and age3. Note that while we use the max function, you could also use the min function. Both functions work because SPSS does not consider a missing value to be either infinitely large nor infinitely small (as SAS and Stata do). Note that the variables not directly referenced in the aggregate command are dropped (i.e., all variables aside from famid age1, age2 and age3 are dropped).

aggregate outfile 'c:kids1.sav'
 /break famid
 /age1 to age3 = max(age1 to age3).
get file 'c:kids1.sav'.
list.

    FAMID     AGE1     AGE2     AGE3

     1.00     9.00     6.00     3.00
     2.00     8.00     6.00     2.00
     3.00     6.00     4.00     2.00

Number of cases read:  3    Number of cases listed:  3

Example #2: Two variables

In the example above, we reshaped only one variable, age. In the example below, we reshape the variables age and wt. Note that for each variable to be reshaped, you need to add a vector and a compute command. You also need to get the max (or min) of the variable in the aggregate command. Hence, you can reshape as many variables as needed.

get file 'c:kidslw.sav'.
list famid birth age.

       ID     FAMID BIRTH     AGE

     1.00      1.00 1.00      9.00
     2.00      1.00 2.00      6.00
     3.00      1.00 3.00      3.00
     4.00      2.00 1.00      8.00
     5.00      2.00 2.00      6.00
     6.00      2.00 3.00      2.00
     7.00      3.00 1.00      6.00
     8.00      3.00 2.00      4.00
     9.00      3.00 3.00      2.00

Number of cases read:  9    Number of cases listed:  9

vector age(3).
vector wt(3).
compute age(birth) = age.
compute wt(birth) = wt.
list.

The variables are listed in the following order:

LINE   1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1

LINE   2: AGE2 AGE3 WT1 WT2 WT3

      ID:      1.00      1.00 Beth      1.00      9.00     60.00 f        9.00
    AGE2:      .        .      60.00      .        .

      ID:      2.00      1.00 Bob       2.00      6.00     40.00 m         .
    AGE2:     6.00      .        .      40.00      .

      ID:      3.00      1.00 Barb      3.00      3.00     20.00 f         .
    AGE2:      .       3.00      .        .      20.00

      ID:      4.00      2.00 Andy      1.00      8.00     80.00 m        8.00
    AGE2:      .        .      80.00      .        .

      ID:      5.00      2.00 Al        2.00      6.00     50.00 m         .
    AGE2:     6.00      .        .      50.00      .

      ID:      6.00      2.00 Ann       3.00      2.00     20.00 f         .
    AGE2:      .       2.00      .        .      20.00

      ID:      7.00      3.00 Pete      1.00      6.00     60.00 m        6.00
    AGE2:      .        .      60.00      .        .

      ID:      8.00      3.00 Pam       2.00      4.00     40.00 f         .
    AGE2:     4.00      .        .      40.00      .

      ID:      9.00      3.00 Phil      3.00      2.00     20.00 m         .
    AGE2:      .       2.00      .        .      20.00

Number of cases read:  9    Number of cases listed:  9

aggregate outfile 'c:kids2.sav'
 /break famid
 /age1 to age3 = max(age1 to age3)
 /wt1 to wt3 = max(wt1 to wt3).
get file 'c:kids2.sav'.
list.

    FAMID     AGE1     AGE2     AGE3      WT1      WT2      WT3

     1.00     9.00     6.00     3.00    60.00    40.00    20.00
     2.00     8.00     6.00     2.00    80.00    50.00    20.00
     3.00     6.00     4.00     2.00    60.00    40.00    20.00

Number of cases read:  3    Number of cases listed:  3

Example #3: Both numeric and character variables

The examples above showed how to reshape data using numeric variables, but sometimes you will need to reshape string variables as well. In this example, we reshape two numeric variables, age and wt, and two string variables, sex and kidname. We need to create the two string variables before creating the vectors, and we use the string command to do this. Note that while you can use the compute command to create a numeric variable, you can only use the compute command to modify a string variable.

Consider the kidslw data file shown below.

get file 'c:kidslw.sav'.
list.

       ID     FAMID KIDNAME     BIRTH       AGE        WT SEX

     1.00      1.00 Beth         1.00      9.00     60.00 f
     2.00      1.00 Bob          2.00      6.00     40.00 m
     3.00      1.00 Barb         3.00      3.00     20.00 f
     4.00      2.00 Andy         1.00      8.00     80.00 m
     5.00      2.00 Al           2.00      6.00     50.00 m
     6.00      2.00 Ann          3.00      2.00     20.00 f
     7.00      3.00 Pete         1.00      6.00     60.00 m
     8.00      3.00 Pam          2.00      4.00     40.00 f
     9.00      3.00 Phil         3.00      2.00     20.00 m

Number of cases read:  9    Number of cases listed:  9


vector age(3).
vector wt(3).
string sex1 to sex3 (A4).
vector sex= sex1 to sex3.
string kidname1 to kidname3 (A4).
vector kidname= kidname1 to kidname3.
compute kidname(birth) = kidname.
compute wt(birth) = wt.
compute sex(birth) = sex.
compute age(birth) = age.
list.

The variables are listed in the following order:

LINE   1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1

LINE   2: AGE2 AGE3 WT1 WT2 WT3 SEX1 SEX2 SEX3 KIDNAME1 KIDNAME2

LINE   3: KIDNAME3

      ID:      1.00      1.00 Beth      1.00      9.00     60.00 f        9.00
    AGE2:      .        .      60.00      .        .   f              Beth
KIDNAME3:

      ID:      2.00      1.00 Bob       2.00      6.00     40.00 m         .
    AGE2:     6.00      .        .      40.00      .        m              Bob
KIDNAME3:

      ID:      3.00      1.00 Barb      3.00      3.00     20.00 f         .
    AGE2:      .       3.00      .        .      20.00           f
KIDNAME3: Barb

      ID:      4.00      2.00 Andy      1.00      8.00     80.00 m        8.00
    AGE2:      .        .      80.00      .        .   m              Andy
KIDNAME3:

      ID:      5.00      2.00 Al        2.00      6.00     50.00 m         .
    AGE2:     6.00      .        .      50.00      .        m              Al
KIDNAME3:

      ID:      6.00      2.00 Ann       3.00      2.00     20.00 f         .
    AGE2:      .       2.00      .        .      20.00           f
KIDNAME3: Ann

      ID:      7.00      3.00 Pete      1.00      6.00     60.00 m        6.00
    AGE2:      .        .      60.00      .        .   m              Pete
KIDNAME3:

      ID:      8.00      3.00 Pam       2.00      4.00     40.00 f         .
    AGE2:     4.00      .        .      40.00      .        f              Pam
KIDNAME3:

      ID:      9.00      3.00 Phil      3.00      2.00     20.00 m         .
    AGE2:      .       2.00      .        .      20.00           m
KIDNAME3: Phil

Number of cases read:  9    Number of cases listed:  9

aggregate outfile 'c:kids3.sav'
 /break famid
 /age1 to age3 = max(age1 to age3)
 /wt1 to wt3 = max(wt1 to wt3)
 /sex1 to sex3 = max(sex1 to sex3)
 /kidname1 to kidname3 = max(kidname1 to kidname3).
get file 'c:kids3.sav'.
list.

The variables are listed in the following order:

LINE   1: FAMID AGE1 AGE2 AGE3 WT1 WT2 WT3 SEX1

LINE   2: SEX2 SEX3 KIDNAME1 KIDNAME2 KIDNAME3

   FAMID:      1.00     9.00     6.00     3.00    60.00    40.00    20.00 f
    SEX2: m    f    Beth Bob  Barb

   FAMID:      2.00     8.00     6.00     2.00    80.00    50.00    20.00 m
    SEX2: m    f    Andy Al   Ann

   FAMID:      3.00     6.00     4.00     2.00    60.00    40.00    20.00 m
    SEX2: f    m    Pete Pam  Phil

Number of cases read:  3    Number of cases listed:  3

Example #4: Character suffixes

You cannot use character suffixes in SPSS when reshaping data because you cannot aggregate the data to form the wide data file. Hence, in this example, we use a numeric suffix while reshaping and rename the variables with the character suffix as the last step.

get file 'c:dadmoml.sav'.
list.

       ID     FAMID NAME       INC DADMOM

     1.00      2.00 Art   22000.00 dad
     2.00      1.00 Bill  30000.00 dad
     3.00      3.00 Paul  25000.00 dad
     4.00      1.00 Bess  15000.00 mom
     5.00      3.00 Pat   50000.00 mom
     6.00      2.00 Amy   18000.00 mom

Number of cases read:  6    Number of cases listed:  6

if dadmom = "mom" dadmomN = 1.
if dadmom = "dad" dadmomN = 2.
list.

       ID     FAMID NAME       INC DADMOM  DADMOMN

     1.00      2.00 Art   22000.00 dad        2.00
     2.00      1.00 Bill  30000.00 dad        2.00
     3.00      3.00 Paul  25000.00 dad        2.00
     4.00      1.00 Bess  15000.00 mom        1.00
     5.00      3.00 Pat   50000.00 mom        1.00
     6.00      2.00 Amy   18000.00 mom        1.00

Number of cases read:  6    Number of cases listed:  6

string name1 to name2 (A4).
vector name = name1 to name2.
compute name(dadmomN) = name.
vector inc(2).
compute inc(dadmomN) = inc.
list.

       ID     FAMID NAME       INC DADMOM  DADMOMN NAME1 NAME2     INC1     INC2

     1.00      2.00 Art   22000.00 dad        2.00       Art        .   22000.00
     2.00      1.00 Bill  30000.00 dad        2.00       Bill       .   30000.00
     3.00      3.00 Paul  25000.00 dad        2.00       Paul       .   25000.00
     4.00      1.00 Bess  15000.00 mom        1.00 Bess        15000.00      .
     5.00      3.00 Pat   50000.00 mom        1.00 Pat         50000.00      .
     6.00      2.00 Amy   18000.00 mom        1.00 Amy         18000.00      .

Number of cases read:  6    Number of cases listed:  6

aggregate outfile 'c:kids4.sav'
 /break famid
 /inc1 to inc2 = max(inc1 to inc2)
 /name1 to name2 = max(name1 to name2).
get file 'c:kids4.sav'.
list.

    FAMID     INC1     INC2 NAME1 NAME2

     1.00 15000.00 30000.00 Bess  Bill
     2.00 18000.00 22000.00 Amy   Art
     3.00 50000.00 25000.00 Pat   Paul

rename variables (inc1 to inc2 = incmom incdad).
rename variables (name1 to name2 = namemom namedad).
list.
 
 Number of cases read:  3    Number of cases listed:  3

    FAMID   INCMOM   INCDAD NAMEMOM NAMEDAD

     1.00 15000.00 30000.00 Bess    Bill
     2.00 18000.00 22000.00 Amy     Art
     3.00 50000.00 25000.00 Pat     Paul

Number of cases read:  3    Number of cases listed:  3

Example #5: No index variable

There may be times when your data are ordered but an index variable is not present. In this situation, you will need to create an index variable before you can reshape the data. This example illustrates how to create an index variable and then use that index variable in the reshaping. We will use the data from Example 1. In that example, birth is the index variable. Hence, we will create an index variable which will have the same values as the variable birth. We will call our index variable index.

get file 'c:kidslw.sav'.
list famid birth age.

    FAMID     BIRTH       AGE

     1.00      1.00      9.00
     1.00      2.00      6.00
     1.00      3.00      3.00
     2.00      1.00      8.00
     2.00      2.00      6.00
     2.00      3.00      2.00
     3.00      1.00      6.00
     3.00      2.00      4.00
     3.00      3.00      2.00

Number of cases read:  9    Number of cases listed:  9

compute index = 1.
if famid = lag(famid) index = lag(index) + 1.
vector age(3).
compute age(index) = age.
list id famid age age1 age2 age3 birth index.

       ID     FAMID       AGE     AGE1     AGE2     AGE3     BIRTH    INDEX

     1.00      1.00      9.00     9.00      .        .        1.00     1.00
     2.00      1.00      6.00      .       6.00      .        2.00     2.00
     3.00      1.00      3.00      .        .       3.00      3.00     3.00
     4.00      2.00      8.00     8.00      .        .        1.00     1.00
     5.00      2.00      6.00      .       6.00      .        2.00     2.00
     6.00      2.00      2.00      .        .       2.00      3.00     3.00
     7.00      3.00      6.00     6.00      .        .        1.00     1.00
     8.00      3.00      4.00      .       4.00      .        2.00     2.00
     9.00      3.00      2.00      .        .       2.00      3.00     3.00

Number of cases read:  9    Number of cases listed:  9

aggregate outfile 'c:kids5.sav'
 /break famid
 /age1 to age3 = max(age1 to age3).
get file 'c:kids5.sav'.
list.

    FAMID     AGE1     AGE2     AGE3

     1.00     9.00     6.00     3.00
     2.00     8.00     6.00     2.00
     3.00     6.00     4.00     2.00

Number of cases read:  3    Number of cases listed:  3