Reshaping data long to wide | Stata Learning Modules

Stata Learning Module
Reshaping data long to wide

This module illustrates the power and simplicity of Stata in its ability to reshape data files. These examples take long data files and reshape them into wide form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.

Example #1: Reshaping data long to wide

The reshape command can be used to make data from a long format to a wide format. Consider the kids file. (To make things simple at first, we will drop the variables kidname sex and wt.)

use https://stats.idre.ucla.edu/stat/stata/modules/kids, clear 

drop kidname sex wt 

list 

          famid      birth        age 
  1.         1          1          9  
  2.         1          2          6  
  3.         1          3          3  
  4.         2          1          8  
  5.         2          2          6  
  6.         2          3          2  
  7.         3          1          6  
  8.         3          2          4  
  9.         3          3          2

Let’s make age in this file wide, making one record per family which would contain age1 age2 age3, the ages of the kids in the family (age2 would be missing if there is only 1 kid, and age3 would be missing if there are only 2 kids).

reshape wide age, i(famid)  j(birth)  

 (note:  j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        9   ->       3
Number of variables                   3   ->       4
j variable (3 values)             birth   ->   (dropped)
xij variables:
                                    age   ->   age1 age2 age3
----------------------------------------------------------------------------- 

list 

          famid       age1       age2       age3 
  1.         1          9          6          3  
  2.         2          8          6          2  
  3.         3          6          4          2

Let’s look at the pieces of the reshape command.
. reshape wide age, j(birth) i(famid)
wide tells reshape that we want to go from long to wide
age tells Stata that the variable to be converted from long to wide is age
i(famid) tells reshape that famid uniquely identifies observations in the wide form
j(birth) tells reshape that the suffix of age (1 2 3) should be taken from the variable birth

Example #2: Reshaping data long to wide with more than one variable

The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable. In the example below, we reshape the variables age, wt and sex like this.

use https://stats.idre.ucla.edu/stat/stata/modules/kids, clear 

list 

          famid    kidname      birth        age         wt        sex 
  1.         1       Beth          1          9         60          f  
  2.         1        Bob          2          6         40          m  
  3.         1       Barb          3          3         20          f  
  4.         2       Andy          1          8         80          m  
  5.         2         Al          2          6         50          m  
  6.         2        Ann          3          2         20          f  
  7.         3       Pete          1          6         60          m  
  8.         3        Pam          2          4         40          f  
  9.         3       Phil          3          2         20          m   

reshape wide kidname age wt sex, i(famid) j(birth) 

 (note:  j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        9   ->       3
Number of variables                   6   ->      13
j variable (3 values)             birth   ->   (dropped)
xij variables:
                                kidname   ->   kidname1 kidname2 kidname3
                                    age   ->   age1 age2 age3
                                     wt   ->   wt1 wt2 wt3
                                    sex   ->   sex1 sex2 sex3
----------------------------------------------------------------------------- 

list 

 Observation 1

       famid            1    kidname1         Beth        age1            9
         wt1           60        sex1            f    kidname2          Bob
        age2            6         wt2           40        sex2            m
    kidname3         Barb        age3            3         wt3           20
        sex3            f


Observation 2

       famid            2    kidname1         Andy        age1            8
         wt1           80        sex1            m    kidname2           Al
        age2            6         wt2           50        sex2            m
    kidname3          Ann        age3            2         wt3           20
        sex3            f


Observation 3

       famid            3    kidname1         Pete        age1            6
         wt1           60        sex1            m    kidname2          Pam
        age2            4         wt2           40        sex2            f
    kidname3         Phil        age3            2         wt3           20
        sex3            m

Example #3: Reshaping wide with character suffixes

The examples above showed how to reshape data using numeric suffixes, but reshape can handle character suffixes as well.

Consider the dadmoml data file shown below.

use https://stats.idre.ucla.edu/stat/stata/modules/dadmoml, clear 

list 

          famid       name        inc     dadmom 
  1.         2        Art      22000        dad  
  2.         1       Bill      30000        dad  
  3.         3       Paul      25000        dad  
  4.         1       Bess      15000        mom  
  5.         3        Pat      50000        mom  
  6.         2        Amy      18000        mom

Let’s reshape this to be in a wide format, containing one record per family. The reshape command below uses the string option to tell Stata that the suffix is character.

reshape wide name inc, i(famid) j(dadmom) string 

 (note:  j = dad mom)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        6   ->       3
Number of variables                   4   ->       5
j variable (2 values)            dadmom   ->   (dropped)
xij variables:
                                   name   ->   namedad namemom
                                    inc   ->   incdad incmom
----------------------------------------------------------------------------- 

list 

          famid    namedad     incdad    namemom     incmom 
  1.         1       Bill      30000       Bess      15000  
  2.         2        Art      22000        Amy      18000  
  3.         3       Paul      25000        Pat      50000

Summary

Reshaping data long to wide

 Long format
        famid      birth         age 
  1.         1          1          9 
  2.         1          2          6 
  3.         1          3          3 
  4.         2          1          8 
  5.         2          2          6 
  6.         2          3          2 
  7.         3          1          6 
  8.         3          2          4 
  9.         3          3          2 

reshape wide age, j(birth) i(famid) Wide format
         famid       age1       age2       age3 
  1.         1          9          6          3 
  2.         2          8          6          2 
  3.         3          6          4          2

The general syntax of reshape wide can be expressed as:

. reshape wide long-var(s),  i( wide-id-var ) j( var-with-suffix )

where
long-var(s)      is the name of the long variable(s) to be made wide e.g. age
wide-id-var      is the variable that uniquely identifies wide 
                 observations, e.g. famid
var-with-suffix  is the variable from the long file that contains 
                 the suffix for the wide variables, e.g. age

Stata Learning ModuleReshaping data long to wide

Example #1: Reshaping data long to wide

Example #2: Reshaping data long to wide with more than one variable

Example #3: Reshaping wide with character suffixes

Summary

Stata Learning Module
Reshaping data long to wide