Sometimes, two variables in a dataset may convey the same information, except one being numeric variable and the other being a string variable. For example, in the data set below, we have a numeric variable a coded 1/0 for gender and a string variable b also for gender but with more explicit information. It is easy to use the numeric variable, but we may also want to keep the information given from the string variable. This is a case where we want to create value labels for the numeric variable based on the string variable. In SAS, we will create a format from the string variable and apply the format to the numeric variable.
Example 1: A simple example
We have a tiny data set containing the two variables a and b and two observations.
data test; input a b $; datalines; 1 female 0 male ; run;
Apparently we want to create a format for variable a so that 1 = female and 0 = male. It is easy to create a format simply using the procedure format. For example, we can do the following.
proc format; value gender 1 = "female" 0 = "male"; run; proc format; select gender; run;---------------------------------------------------------------------------- | FORMAT NAME: GENDER LENGTH: 6 NUMBER OF VALUES: 2 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 20MAY2004:14:25:17)| |----------------+----------------+----------------------------------------| | 0| 0|male | | 1| 1|female | ----------------------------------------------------------------------------
We can also do the following using the a data step. This approach does not depend on the number of categories of the string variable. The code will be exactly the same. This is definitely easier when the number of categories is large.
data fmt_dataset; retain fmtname "lgender"; set test ; start = a; label = b; run; proc format cntlin = fmt_dataset fmtlib; select lgender; run;---------------------------------------------------------------------------- | FORMAT NAME: LGENDER LENGTH: 6 NUMBER OF VALUES: 2 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 20MAY2004:14:01:06)| |----------------+----------------+----------------------------------------| | 0| 0|male | | 1| 1|female | ----------------------------------------------------------------------------
Example 2: Another simple (but not so simple) example
We have a dataset called test2 and it looks like the following. There are many repeated rows in the dataset. If we apply the same approach from the previous example, SAS will yield an error message saying that the range is repeated, or values overlap. So we need extract a smaller dataset with no repeats in it.
data test2; input group variable $; datalines; 0 group1 0 group1 0 group1 0 group1 1 group2 1 group2 1 group2 1 group2 2 group3 2 group3 2 group3 2 group3 3 group4 3 group4 3 group4 3 group4 ; run;
The easiest way of creating a dataset without repeats is to use proc sql.
proc sql; create table tofmt as select distinct group, variable from test2; quit;proc print data = tofmt; run;Obs group variable 1 0 group1 2 1 group2 3 2 group3 4 3 group4
Now we are ready to create the format out of the dataset tofmt.
data fmt_dataset; retain fmtname "cvar"; set tofmt ; start = group; label = variable; run; proc format cntlin = fmt_dataset fmtlib; select cvar; run;---------------------------------------------------------------------------- | FORMAT NAME: CVAR LENGTH: 6 NUMBER OF VALUES: 4 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 09JUN2008:16:23:21)| |----------------+----------------+----------------------------------------| | 0| 0|group1 | | 1| 1|group2 | | 2| 2|group3 | | 3| 3|group4 | ---------------------------------------------------------------------------- proc print data = test2; format group cvar.; run;Obs group variable 1 group1 group1 2 group1 group1 3 group1 group1 4 group1 group1 5 group2 group2 6 group2 group2 7 group2 group2 8 group2 group2 9 group3 group3 10 group3 group3 11 group3 group3 12 group3 group3 13 group4 group4 14 group4 group4 15 group4 group4 16 group4 group4