Sometimes your dataset includes an identifying variable that is unnecessarily long and uninformative. For example, your ID variable may be a string of length 12 with both letters and numbers (i.e., "77A34987BG34"). You may wish to create a new identifying variable that simply maps the complicated ID variable onto integers starting at 1 and going up to as many unique IDs appear in your dataset. The code below provides an example of how to do this.
data test; input id a b; cards; 9385793487 0 0 3598437987 1 0 5987398759 1 0 9593859853 0 1 5987398759 0 0 9385793487 0 0 3598437987 0 1 7892343344 1 1 ; proc print data = test; run; Obs id a b 1 9385793487 0 0 2 3598437987 1 0 3 5987398759 1 0 4 9593859853 0 1 5 5987398759 0 0 6 9385793487 0 0 7 3598437987 0 1 8 7892343344 1 1 proc sort data = test; by id; run; data test2; set test; by id; retain newid 0; if first.id then newid = newid + 1; run; proc print data = test2; run; Obs id a b newid 1 3598437987 1 0 1 2 3598437987 0 1 1 3 5987398759 1 0 2 4 5987398759 0 0 2 5 7892343344 1 1 3 6 9385793487 0 0 4 7 9385793487 0 0 4 8 9593859853 0 1 5
Now our dataset has a short and informative identifying variable.