1. Introduction
This module illustrates (1) how to create and recode variables manually and (2) how to use foreach to ease the process of creating and recoding variables.
Consider the sample program below, which reads in income data for twelve months.
input famid inc1-inc12 1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818 2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215 endlist
The output is shown below
list famid inc1-inc12, clean famid inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8 inc9 inc10 inc11 inc12 1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818 2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215
2. Computing variables (manually)
Say that we wanted to compute the amount of tax (10%) paid for each month, the simplest way to do this is to compute 12 variables (taxinc1-taxinc12) by multiplying each of the (inc1-inc12) by .10 as illustrated below. As you see, this requires entering a command computing the tax for each month of data (for months 1 to 12) via the generate command.
generate taxinc1 = inc1 * .10 generate taxinc2 = inc2 * .10 generate taxinc3 = inc3 * .10 generate taxinc4 = inc4 * .10 generate taxinc5 = inc5 * .10 generate taxinc6 = inc6 * .10 generate taxinc7 = inc7 * .10 generate taxinc8 = inc8 * .10 generate taxinc9 = inc9 * .10 generate taxinc10= inc10 * .10 generate taxinc11= inc11 * .10 generate taxinc12= inc12 * .10
The output is shown below.
+----------------------------------------------------------------------------------------------+ 1. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 | | 1 | 3281 | 3413 | 3114 | 2500 | 2700 | 3500 | 3114 | 3319 | 3514 | 1282 | 2434 | 2818 | |----------------------------------------------------------------------------------------------| | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5 | taxinc6 | taxinc7 | taxinc8 | taxinc9 | | 328.1 | 341.3 | 311.4 | 250 | 270 | 350 | 311.4 | 331.9 | 351.4 | |----------------------------------------------------------------------------------------------| | taxinc10 | taxinc11 | taxinc12 | | 128.2 | 243.4 | 281.8 | +----------------------------------------------------------------------------------------------+ +----------------------------------------------------------------------------------------------+ 2. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 | | 2 | 4042 | 3084 | 3108 | 3150 | 3800 | 3100 | 1531 | 2914 | 3819 | 4124 | 4274 | 4471 | |----------------------------------------------------------------------------------------------| | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5 | taxinc6 | taxinc7 | taxinc8 | taxinc9 | | 404.2 | 308.4 | 310.8 | 315 | 380 | 310 | 153.1 | 291.4 | 381.9 | |----------------------------------------------------------------------------------------------| | taxinc10 | taxinc11 | taxinc12 | | 412.4 | 427.4 | 447.1 | +----------------------------------------------------------------------------------------------+ +----------------------------------------------------------------------------------------------+ 3. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 | | 3 | 6015 | 6123 | 6113 | 6100 | 6100 | 6200 | 6186 | 6132 | 3123 | 4231 | 6039 | 6215 | |----------------------------------------------------------------------------------------------| | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5 | taxinc6 | taxinc7 | taxinc8 | taxinc9 | | 601.5 | 612.3 | 611.3 | 610 | 610 | 620 | 618.6 | 613.2 | 312.3 | |----------------------------------------------------------------------------------------------| | taxinc10 | taxinc11 | taxinc12 | | 423.1 | 603.9 | 621.5 | +----------------------------------------------------------------------------------------------+
3. Computing variables (using the foreach command)
Another way to compute 12 variables representing the amount of tax paid (10%) for each month is to use the foreach command. In the example below we use the foreach command to cycle through the variables inc1 to inc12 and compute the taxable income as taxinc1 – taxinc12.
foreach var of varlist inc1-inc12 { generate tax`var' = `var' * .10 }
The initial foreach statement tells Stata that we want to cycle through the variables inc1 to inc12 using the statements that are surrounded by the curly braces. The first time we cycle through the statements, the value of var will be inc1 and the second time the value of var will be inc2 and so on until the final iteration where the value of var will be inc12. Each statement within the loop (in this case, just the one generate statement) is evaluated and executed. When we are inside the foreach loop, we can access the value of var by surrounding it with the funny quotation marks like this `var’ . The ` is the quote right below the ~ on your keyboard and the ‘ is the quote below the ” on your keyboard. The first time through the loop, `var’ is replaced with inc1, so the statement
generate tax`var' = `var' * .10
becomes
generate taxinc1 = inc1 * .10
This is repeated for inc2 and then inc3 and so on until inc12. So, this foreach loop is the equivalent of executing the 12 generate statements manually, but much easier and less error prone.
4. Collapsing across variables (manually)
Often one needs to sum across variables (also known as collapsing across variables). For example, let’s say the quarterly income for each observation is desired. In order to get this information, four quarterly variables incqtr1-incqtr4 need to be computed. Again, this can be achieved manually or by using the foreach command. Below is an example of how to compute 4 quarterly income variables incqtr1-incqtr4 by simply adding together the months that comprise a quarter.
generate incqtr1 = inc1 + inc2 + inc3 generate incqtr2 = inc4 + inc5 + inc6 generate incqtr3 = inc7 + inc8 + inc9 generate incqtr4 = inc10+ inc11+ inc12list incqtr1 – incqtr4
The output is shown below.
+---------------------------------------+ | incqtr1 incqtr2 incqtr3 incqtr4 | |---------------------------------------| 1. | 9808 8700 9947 6534 | 2. | 10234 10050 8264 12869 | 3. | 18251 18400 15441 16485 | +---------------------------------------+
5. Collapsing across variables (using the foreach command)
This same result as above can be achieved using the foreach command. The example below illustrates how to compute the quarterly income variables incqtr1-incqtr4 using the foreach command.
foreach qtr of numlist 1/4 { local m3 = `qtr'*3 local m2 = (`qtr'*3)-1 local m1 = (`qtr'*3)-2 generate incqtr`qtr' = inc`m1' + inc`m2' + inc`m3' } list incqtr1 - incqtr4
The output is shown below.
+---------------------------------------+ | incqtr1 incqtr2 incqtr3 incqtr4 | |---------------------------------------| 1. | 9808 8700 9947 6534 | 2. | 10234 10050 8264 12869 | 3. | 18251 18400 15441 16485 | +---------------------------------------+
In this example, instead of cycling across variables, the foreach command is cycling across numbers, 1, 2, 3 then 4 which we refer to as qtr which represent the 4 quarters of variables that we wish to create. The trick is the relationship between the quarter and the month numbers that compose the quarter and to create a kind of formula that relates the quarters to the months. For example, quarter 1 of data corresponds to months 3, 2 and 1, so we can say that when the quarter (qtr) is 1 we want the months represented by qtr*3, (qtr*3)-1 and (qtr*3)-2, yielding 3, 2, and 1. This is what the statements below from the foreach loop are doing. They are relating the quarter to the months.
local m3 = `qtr'*3 local m2 = (`qtr'*3)-1 local m1 = (`qtr'*3)-2
So, when qtr is 1, the value for m3 is 1*3, the value for m2 is (1*3)-1 and the value for m1 is (1*3)-2. Then, imagine all of those values being substituted into the following statement from the foreach loop.
generate incqtr`qtr' = inc`m1' + inc`m2' + inc`m3'
This then becomes
generate incqtr1 = inc3 + inc2 + inc1
and for the next quarter (when qtr becomes 2) the statement would become
generate incqtr2 = inc6 + inc5 + inc4
In this example, with only 4 quarters of data, it would probably be easier to simply write out the 4 generate statements manually, however if you had 40 quarters of data, then the foreach loop can save you considerable time, effort and mistakes.
6. Identifying patterns across variables (using the foreach command)
The foreach command can also be used to identify patterns across variables of a dataset. Let’s say, for example, that one needs to know which months had income that was less than the income of the previous month. To obtain this information, dummy indicators can be created to indicate in which months this occurred. Note that only 11 dummy indicators are needed for a 12 month period because the interest is in the change from one month to the next. When a month has income that is less than the income of the previous month, the dummy indicators lowinc2-lowinc12 get assigned a “1”. When this is not the case, they are assigned a “0”. This program is illustrated below (note for simplicity we assume no missing data on income).
foreach curmon of numlist 2/12 { local lastmon = `curmon' - 1 generate lowinc`curmon' = 1 if ( inc`curmon' < inc`lastmon' ) replace lowinc`curmon' = 0 if ( inc`curmon' >= inc`lastmon' ) }
We can list out the original values of inc and lowinc and verify that this worked properly
list famid inc1-inc12, clean noobs famid inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8 inc9 inc10 inc11 inc12 1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818 2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215list famid lowinc2-lowinc12, clean noobs famid lowinc2 lowinc3 lowinc4 lowinc5 lowinc6 lowinc7 lowinc8 lowinc9 lowinc10 lowinc11 lowinc12 1 0 1 1 0 0 1 0 0 1 0 0 2 1 0 0 0 1 1 0 0 0 0 0 3 0 1 1 0 0 1 1 1 0 0 0
This time we used the foreach loop to compare the current month, represented by curmon, and the prior month, computed as `curmon’-1 creating lastmon. So, for the first pass through the foreach loop the value for curmon is 2 and the value for lastmon is 1, so the generate and replace statements become
generate lowinc2 = 1 if ( inc2 < inc1 ) replace lowinc2 = 0 if ( inc2 >= inc1 )
The process is repeated until curmon is 12, and then the generate and replace statements become
generate lowinc12 = 1 if ( inc12 < inc11 ) replace lowinc12 = 0 if ( inc12 >= inc11 )
If you were using foreach to span a large range of values (say 1/1000) then it is more effcient to use forvalues since it is designed to quickly increment through a sequential list, for example
forvalues curmon = 2/12 { local lastmon = `curmon' - 1 generate lowinc`curmon' = 1 if ( inc`curmon' < inc`lastmon' ) replace lowinc`curmon' = 0 if ( inc`curmon' >= inc`lastmon' ) }