How can I recode my ID variable to be short and numeric?

Sometimes your dataset includes an identifying variable that is unnecessarily long and uninformative. For example, your ID variable may be a string of length 12 with both letters and numbers (i.e., "77A34987BG34"). You may wish to create a new identifying variable that simply maps the complicated ID variable onto integers starting at 1 and going up to as many unique IDs appear in your dataset. The code below provides an example of how to do this.

data test;
  input id a b;
  cards;
9385793487 0 0
3598437987 1 0
5987398759 1 0
9593859853 0 1
5987398759 0 0
9385793487 0 0
3598437987 0 1
7892343344 1 1
;

proc print data = test;
run;

Obs        id        a    b

 1     9385793487    0    0
 2     3598437987    1    0
 3     5987398759    1    0
 4     9593859853    0    1
 5     5987398759    0    0
 6     9385793487    0    0
 7     3598437987    0    1
 8     7892343344    1    1


proc sort data = test;
  by id;
run;

data test2; set test;
  by id;
  retain newid 0;
  if first.id then newid = newid + 1;
run;

proc print data = test2; 
run;

Obs        id        a    b    newid

 1     3598437987    1    0      1
 2     3598437987    0    1      1
 3     5987398759    1    0      2
 4     5987398759    0    0      2
 5     7892343344    1    1      3
 6     9385793487    0    0      4
 7     9385793487    0    0      4
 8     9593859853    0    1      5

Now our dataset has a short and informative identifying variable.