Data are often collected on pairs of subjects (twins, couples, parent and child) or on the same subject twice (with and without treatment, before and after). Depending on the analysis, you may wish to have one observation represent one subject (or subject-time)–"long" form. Or you may wish to have one observation represent one pair–"wide" form.
Reshaping from long to wide in SAS can often be done using arrays or proc transpose. See our SAS arrays seminar and our reshaping in SAS learning module for details on these. However, reshaping pairs data from long to wide often means creating a second set of an unknown number of variables that are likely not named in any consistent or numbered manner. These issues make arrays and proc transpose difficult to use (though not impossible!).
This page provides a set of steps that can be used to reshape pairs data. We are assuming that the pairs are identified with a pair ID and that, within a pair, the subjects are differentiated with another variable.
This small sample dataset contains four variables (a, b, c, d) for three pairs where the subjects within each pair are designated with the variable t:
data example; input id a b c d t; datalines; 1 0.76 0.21 0.11 0.86 1 1 0.39 0.39 0.94 0.17 2 2 0.18 0.50 0.54 0.42 1 2 0.74 0.88 0.74 0.81 2 3 0.97 0.82 0.03 0.40 1 3 0.06 0.42 0.47 0.69 2 ;
First, using the with-pair ID variable, we create two datasets with one observation each per pair.
data out1 out2; set example; if t = 1 then output out1; if t = 2 then output out2; run;
Next, we use one of the macros found in Code Fragment: Renaming variables dynamically to rename the variables in both new datasets.
%macro rename2(oldvarlist, suffix); %let k=1; %let old = %scan(&oldvarlist, &k); %do %while("&old" NE ""); rename &old = &old.&suffix; %let k = %eval(&k + 1); %let old = %scan(&oldvarlist, &k); %end; %mend; %let varlist = a b c d; data out1; set out1; %rename2(&varlist, _1); run; data out2; set out2; %rename2(&varlist, _2); run;
We have now created two datasets where the variables of interest (those in the macro variable varlist) have been renamed appropriately. Now we can merge these two datasets. This method will work smoothly even if there are pair IDs that do not have two observations.
data all; merge out1 (drop=t) out2 (drop=t); by id; run; proc print data = all; run; Obs id a_1 b_1 c_1 d_1 a_2 b_2 c_2 d_2 1 1 0.76 0.21 0.11 0.86 0.39 0.39 0.94 0.17 2 2 0.18 0.50 0.54 0.42 0.74 0.88 0.74 0.81 3 3 0.97 0.82 0.03 0.40 0.06 0.42 0.47 0.69