Sometimes two variables in a dataset may convey the same information, except one is a numeric variable and the other one is a string variable. For example, in the data set below, we have a numeric variable a coded 1/0 for gender and a string variable b also for gender but with more explicit information. It is easy to use the numeric variable, but we may also want to keep the information given from the string variable. This is a case where we want to create value labels for the numeric variable based on the string variable. In Stata, we can use the command called labmask to create value labels for the numeric variable based on the character variable. The command labmask is one of the commands in a suite called labutil written by Nicholas J. Cox. You can download it by typing search labutil (see How can I use the search command to search for programs and get additional help? for more information about using search) and then following the link to it.
Example 1: A simple example
clear input gender str8 female 1 female 0 male end list+-----------------+ | gender female | |-----------------| 1. | 1 female | 2. | 0 male | +-----------------+ labmask gender, values(female) describe Contains data obs: 2 vars: 2 size: 32 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- gender float %9.0g gender female str8 %9s ------------------------------------------------------------------------------- Sorted by: Note: dataset has changed since last saved list +-----------------+ | gender female | |-----------------| 1. | female female | 2. | male male | +-----------------+ label list gender: 0 male 1 female
Example 2: Another example
How is labmask different from encode? Both of the commands create value labels for the numeric version of the string variable. However, the command encode does it based on the alphabetical order of the string values, not based on the values of the numeric variable in the data set that we want to match it to. For example, the new variable cnum below created by encode will have value 1 for boston since it the first alphabetically.
clear input cityn str8 cityc 0 la 0 la 2 boston 2 boston 5 chicago 5 chicago 5 chicago 3 ny 3 ny end encode cityc, gen(cnum) labmask cityn, values(cityc) list +-----------------------------+ | cityn cityc cnum | |-----------------------------| 1. | la la la | 2. | la la la | 3. | boston boston boston | 4. | boston boston boston | 5. | chicago chicago chicago | |-----------------------------| 6. | chicago chicago chicago | 7. | chicago chicago chicago | 8. | ny ny ny | 9. | ny ny ny | +-----------------------------+ list, nolab +------------------------+ | cityn cityc cnum | |------------------------| 1. | 0 la 3 | 2. | 0 la 3 | 3. | 2 boston 1 | 4. | 2 boston 1 | 5. | 5 chicago 2 | |------------------------| 6. | 5 chicago 2 | 7. | 5 chicago 2 | 8. | 3 ny 4 | 9. | 3 ny 4 | +------------------------+