Stata Learning Module
Reshaping data long to wide
This module illustrates the power and simplicity of Stata in its ability to reshape data files. These examples take long data files and reshape them into wide form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.
Example #1: Reshaping data long to wide
The reshape command can be used to make data from a long format to a wide format. Consider the kids file. (To make things simple at first, we will drop the variables kidname sex and wt.)
use https://stats.idre.ucla.edu/stat/stata/modules/kids, clear drop kidname sex wt list famid birth age 1. 1 1 9 2. 1 2 6 3. 1 3 3 4. 2 1 8 5. 2 2 6 6. 2 3 2 7. 3 1 6 8. 3 2 4 9. 3 3 2
Let’s make age in this file wide, making one record per family which would contain age1 age2 age3, the ages of the kids in the family (age2 would be missing if there is only 1 kid, and age3 would be missing if there are only 2 kids).
reshape wide age, i(famid) j(birth) (note: j = 1 2 3) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 9 -> 3 Number of variables 3 -> 4 j variable (3 values) birth -> (dropped) xij variables: age -> age1 age2 age3 ----------------------------------------------------------------------------- list famid age1 age2 age3 1. 1 9 6 3 2. 2 8 6 2 3. 3 6 4 2
Let’s look at the pieces of the reshape command.
. reshape wide age, j(birth) i(famid)wide tells reshape that we want to go from long to wide
age tells Stata that the variable to be converted from long to wide is age
i(famid) tells reshape that famid uniquely identifies observations in the wide form
j(birth) tells reshape that the suffix of age (1 2 3) should be taken from the variable birth
Example #2: Reshaping data long to wide with more than one variable
The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable. In the example below, we reshape the variables age, wt and sex like this.
use https://stats.idre.ucla.edu/stat/stata/modules/kids, clear list famid kidname birth age wt sex 1. 1 Beth 1 9 60 f 2. 1 Bob 2 6 40 m 3. 1 Barb 3 3 20 f 4. 2 Andy 1 8 80 m 5. 2 Al 2 6 50 m 6. 2 Ann 3 2 20 f 7. 3 Pete 1 6 60 m 8. 3 Pam 2 4 40 f 9. 3 Phil 3 2 20 m reshape wide kidname age wt sex, i(famid) j(birth) (note: j = 1 2 3) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 9 -> 3 Number of variables 6 -> 13 j variable (3 values) birth -> (dropped) xij variables: kidname -> kidname1 kidname2 kidname3 age -> age1 age2 age3 wt -> wt1 wt2 wt3 sex -> sex1 sex2 sex3 ----------------------------------------------------------------------------- list Observation 1 famid 1 kidname1 Beth age1 9 wt1 60 sex1 f kidname2 Bob age2 6 wt2 40 sex2 m kidname3 Barb age3 3 wt3 20 sex3 f Observation 2 famid 2 kidname1 Andy age1 8 wt1 80 sex1 m kidname2 Al age2 6 wt2 50 sex2 m kidname3 Ann age3 2 wt3 20 sex3 f Observation 3 famid 3 kidname1 Pete age1 6 wt1 60 sex1 m kidname2 Pam age2 4 wt2 40 sex2 f kidname3 Phil age3 2 wt3 20 sex3 m
Example #3: Reshaping wide with character suffixes
The examples above showed how to reshape data using numeric suffixes, but reshape can handle character suffixes as well.
Consider the dadmoml data file shown below.
use https://stats.idre.ucla.edu/stat/stata/modules/dadmoml, clear list famid name inc dadmom 1. 2 Art 22000 dad 2. 1 Bill 30000 dad 3. 3 Paul 25000 dad 4. 1 Bess 15000 mom 5. 3 Pat 50000 mom 6. 2 Amy 18000 mom
Let’s reshape this to be in a wide format, containing one record per family. The reshape command below uses the string option to tell Stata that the suffix is character.
reshape wide name inc, i(famid) j(dadmom) string (note: j = dad mom) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 6 -> 3 Number of variables 4 -> 5 j variable (2 values) dadmom -> (dropped) xij variables: name -> namedad namemom inc -> incdad incmom ----------------------------------------------------------------------------- list famid namedad incdad namemom incmom 1. 1 Bill 30000 Bess 15000 2. 2 Art 22000 Amy 18000 3. 3 Paul 25000 Pat 50000
Summary
Reshaping data long to wide
Long format famid birth age 1. 1 1 9 2. 1 2 6 3. 1 3 3 4. 2 1 8 5. 2 2 6 6. 2 3 2 7. 3 1 6 8. 3 2 4 9. 3 3 2 reshape wide age, j(birth) i(famid) Wide format famid age1 age2 age3 1. 1 9 6 3 2. 2 8 6 2 3. 3 6 4 2
The general syntax of reshape wide can be expressed as:
. reshape wide long-var(s), i( wide-id-var ) j( var-with-suffix )where long-var(s) is the name of the long variable(s) to be made wide e.g. age wide-id-var is the variable that uniquely identifies wide observations, e.g. famid var-with-suffix is the variable from the long file that contains the suffix for the wide variables, e.g. age