Converting a categorical variable to dummy variables can be a tedious process when done using a series of series of if then statements. Consider the following example data file.
DATA auto ; LENGTH make $ 20 ; INPUT make $ 1-17 price mpg rep78 ; CARDS; AMC Concord 4099 22 3 AMC Pacer 4749 17 3 Audi 5000 9690 17 5 Audi Fox 6295 23 3 BMW 320i 9735 25 4 Buick Century 4816 20 3 Buick Electra 7827 15 4 Buick LeSabre 5788 18 3 Cad. Eldorado 14500 14 2 Olds Starfire 4195 24 1 Olds Toronado 10371 16 3 Plym. Volare 4060 18 2 Pont. Catalina 5798 18 4 Pont. Firebird 4934 18 1 Pont. Grand Prix 5222 19 3 Pont. Le Mans 4723 19 3 ; RUN;
Method 1
The variable rep78 is coded with values from 1 – 5 representing various repair histories. We can create dummy variables for rep78 by writing separate assignment statements for each value as follows:
DATA auto1 ; SET auto ; IF rep78 = 1 THEN rep78_1 = 1; ELSE rep78_1 = 0; IF rep78 = 2 THEN rep78_2 = 1; ELSE rep78_2 = 0; IF rep78 = 3 THEN rep78_3 = 1; ELSE rep78_3 = 0; IF rep78 = 4 THEN rep78_4 = 1; ELSE rep78_4 = 0; IF rep78 = 5 THEN rep78_5 = 1; ELSE rep78_5 = 0; RUN; PROC FREQ DATA=auto1; TABLES rep78*rep78_1*rep78_2*rep78_3*rep78_4*rep78_5 / list ; RUN;
As you see from the proc freq below, the dummy variables were properly created, but it required a lot of if then else statements.
[Output below edited for readability] REP78 REP78_1 REP78_2 REP78_3 REP78_4 REP78_5 Freq Percent ------------------------------------------------------------ 1 1 0 0 0 0 2 12.5 2 0 1 0 0 0 2 12.5 3 0 0 1 0 0 8 50.0 4 0 0 0 1 0 3 18.8 5 0 0 0 0 1 1 6.3
Method 2
Had rep78 ranged from 1 to 10 or 1 to 20, that would be a lot of typing (and prone to error). Here is one possible shortcut you could use when you need to create dummy variables.
DATA auto2; set auto; ARRAY dummys {*} 3. rep78_1 - rep78_5; DO i=1 TO 5; dummys(i) = 0; END; dummys( rep78 ) = 1; RUN; PROC FREQ DATA=auto2; TABLES rep78*rep78_1*rep78_2*rep78_3*rep78_4*rep78_5 / list ; RUN;
As you see below, the dummy variables were created successfully.
[Output below edited for readability] REP78 REP78_1 REP78_2 REP78_3 REP78_4 REP78_5 Freq Percent ----------------------------------------------------------------- 1 1 0 0 0 0 2 12.5 2 0 1 0 0 0 2 12.5 3 0 0 1 0 0 8 50.0 4 0 0 0 1 0 3 18.8 5 0 0 0 0 1 1 6.3
Let’s look at each statement in some detail.
ARRAY dummys {*} 3. rep78_1 - rep78_5;
This statement defines an array called dummys that creates five dummy variables rep78_1 to rep78_5 giving each the minimum storage length required, i.e., 3 bytes. You would change rep78_1 to rep78_5 to be the names you want for your dummy variables. The asterisk in the brackets tells SAS to automatically count up the number of new variables based on the number of variables listed at the end of the statement.
DO i=1 TO 5; dummys(i) = 0; END;
This initialized each dummy variable to 0. You would change 5 to be the number of values your variable could have.
dummys(rep78) = 1;
Set the appropriate dummy variable to 1. For example, if rep78 = 3, then dummys(dummys( rep78 ) = 1 will assign a value of 1 to the third element in the array, i.e., assign 1 to rep78_3. You would change rep78 to the name of the variable for which you want to create dummy variables.
Method 2a
Above, we used a loop to set each array to 0 before we started assigning values to the array. Below, we use (5*0) on the array statement to set the five arrays to 0. We thank Mike Zdeb for this suggestion. Also in this example, we have added some if then statements to account for missing data. This is not necessary in this example because we have no missing data in our little example dataset. However, if the rep78 variable did contain missing data, the if then statements would be necessary.
data auto3; array dummies(5) rep78_1 - rep78_5 (5*0); set auto; if rep78 ne . then dummies(rep78) = 1; output; if rep78 ne . then dummies(rep78) = 0; run; PROC FREQ DATA=auto3; TABLES rep78*rep78_1*rep78_2*rep78_3*rep78_4*rep78_5 / list ; RUN;[Output below edited for readability] REP78 REP78_1 REP78_2 REP78_3 REP78_4 REP78_5 Freq Percent ----------------------------------------------------------------- 1 1 0 0 0 0 2 12.5 2 0 1 0 0 0 2 12.5 3 0 0 1 0 0 8 50.0 4 0 0 0 1 0 3 18.8 5 0 0 0 0 1 1 6.3