There are occasions, especially with survey data, when you need to create an enumeration (also called a counting or identification) variable that starts at one for each group in your data. For example, suppose that you have test scores for students in a class. You may need to create a variable that counts all of the males in the class, and then starts at one and counts all of the females in the class. Let’s look at a small data set and see how this can be easily done.
data students; input gender score; cards; 1 48 1 45 2 50 2 42 1 41 2 51 1 52 1 43 2 52 ; run;
First, we need to sort the data on the grouping variable, in this case, gender.
proc sort data = students; by gender; run;
Next, we will create a new variable called count that will count the number of males and the number of females.
data students1; set students; count + 1; by gender; if first.gender then count = 1; run;
Let’s consider some of the code above and explain what it does and why. The third statement, count + 1, creates the variable count and adds one to each observation as SAS processes the data step. There is an implicit retain statement in this statement. This is why SAS does not reset the value of count to missing before processing the next observation in the data set. The next statement tells SAS the grouping variable. In this example, the grouping variable is gender. The data set must be sorted by this variable before running this data step. The next statement tells SAS when to reset the count and to what value to reset the counter. SAS has two built-in keywords that are useful in situations like these: first. and last. (pronounced "first-dot" and "last-dot"). Note that the period is part of the keyword. The variable listed after the first. keyword is the grouping variable. If we wanted SAS to do something when it came to the last observation in the group, we would use the last. keyword. The last part of the statement is straightforward: after the keyword then we list the name of the variable that we want and set it equal to the value that we want to be assigned to the first observation in the group. In this example, we wanted to start counting at one, but you could put any number there that meets your needs. Now let’s see what our new data set looks like.
proc print data = students1; run;Obs gender score count 1 1 48 1 2 1 45 2 3 1 41 3 4 1 52 4 5 1 43 5 6 2 50 1 7 2 42 2 8 2 51 3 9 2 52 4
As you can see, the process worked as we desired.
Now let’s look at a slightly more complicated example. Suppose that we had two grouping variables, class and gender.
data two; input class gender score; cards; 1 1 48 1 1 45 2 2 50 1 2 42 2 1 41 2 2 51 2 1 52 1 1 43 1 2 52 ; run; proc sort data = two; by class gender; run; data two1; set two; count + 1; by class gender; if first.class or first.gender then count = 1; run; proc print data = two1; run;Obs class gender score count 1 1 1 48 1 2 1 1 45 2 3 1 1 43 3 4 1 2 42 1 5 1 2 52 2 6 2 1 41 1 7 2 1 52 2 8 2 2 50 1 9 2 2 51 2
As you can see, expanding the code to handle multiple layers is simple. Also, although we have only two levels in our grouping variables, the number of levels within any of the grouping variables does not matter.