This module shows the use of if with common Stata commands.
Let’s use the auto data file.
sysuse auto
For this module, we will focus on the variables make, rep78, foreign, mpg, and price. We can use the keep command to keep just these five variables.
keep make rep78 foreign mpg price
Let’s make a table of rep78 by foreign to look at the repair histories of the foreign and domestic cars.
tabulate rep78 foreign| foreign rep78 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 48 21 | 69
Suppose we wanted to focus on just the cars with repair histories of four or better. We can use if suffix to do this.
tabulate rep78 foreign if rep78 >=4| foreign rep78 | 0 1 | Total -----------+----------------------+---------- 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 11 18 | 29
Let’s make the above table using the column and nofreq options. The command column requests column percentages while the command nofreq suppresses cell frequencies. Note that column and nofreq come after the comma. These are options on the tabulate command and options need to be placed after a comma.
tabulate rep78 foreign if rep78 >=4, column nofreq| foreign rep78 | 0 1 | Total -----------+----------------------+---------- 4 | 81.82 50.00 | 62.07 5 | 18.18 50.00 | 37.93 -----------+----------------------+---------- Total | 100.00 100.00 | 100.00
The use of if is not limited to the tabulate command. Here, we use it with the list command.
list if rep78 >= 4make price mpg rep78 foreign 3. AMC Spirit 3799 22 . 0 5. Buick Electra 7827 15 4 0 7. Buick Opel 4453 26 . 0 15. Chev. Impala 5705 16 4 0 20. Dodge Colt 3984 30 5 0 24. Ford Fiesta 4389 28 4 0 29. Merc. Bobcat 3829 22 4 0 30. Merc. Cougar 5379 14 4 0 33. Merc. XR-7 6303 14 4 0 35. Olds 98 8814 21 4 0 38. Olds Delta 88 4890 18 4 0 43. Plym. Champ 4425 34 5 0 45. Plym. Sapporo 6486 26 . 0 47. Pont. Catalina 5798 18 4 0 51. Pont. Phoenix 4424 19 . 0 53. Audi 5000 9690 17 5 1 55. BMW 320i 9735 25 4 1 56. Datsun 200 6229 23 4 1 57. Datsun 210 4589 35 5 1 58. Datsun 510 5079 24 4 1 59. Datsun 810 8129 21 4 1 61. Honda Accord 5799 25 5 1 62. Honda Civic 4499 28 4 1 63. Mazda GLC 3995 30 4 1 64. Peugeot 604 12990 14 . 1 66. Subaru 3798 35 5 1 67. Toyota Celica 5899 18 5 1 68. Toyota Corolla 3748 31 5 1 69. Toyota Corona 5719 18 5 1 70. VW Dasher 7140 23 4 1 71. VW Diesel 5397 41 5 1 72. VW Rabbit 4697 25 4 1 73. VW Scirocco 6850 25 4 1 74. Volvo 260 11995 17 5 1
Did you see that some of the observations had a value of '.' for rep78? These are missing values. For example, the value of rep78 for the AMC Spirit is missing. Stata treats a missing value as positive infinity, the highest number possible. So, when we said list if rep78 >= 4, Stata included the observations where rep78 was ‘.' as well.
If we wanted to include just the valid (non-missing) observations that are greater than or equal to 4, we can do the following to tell Stata we want only observations where rep78 >= 4 and rep78 is not missing.
list if rep78 >= 4 & !missing(rep78)make price mpg rep78 foreign 5. Buick Electra 7827 15 4 0 15. Chev. Impala 5705 16 4 0 20. Dodge Colt 3984 30 5 0 24. Ford Fiesta 4389 28 4 0 29. Merc. Bobcat 3829 22 4 0 30. Merc. Cougar 5379 14 4 0 33. Merc. XR-7 6303 14 4 0 35. Olds 98 8814 21 4 0 38. Olds Delta 88 4890 18 4 0 43. Plym. Champ 4425 34 5 0 47. Pont. Catalina 5798 18 4 0 53. Audi 5000 9690 17 5 1 55. BMW 320i 9735 25 4 1 56. Datsun 200 6229 23 4 1 57. Datsun 210 4589 35 5 1 58. Datsun 510 5079 24 4 1 59. Datsun 810 8129 21 4 1 61. Honda Accord 5799 25 5 1 62. Honda Civic 4499 28 4 1 63. Mazda GLC 3995 30 4 1 66. Subaru 3798 35 5 1 67. Toyota Celica 5899 18 5 1 68. Toyota Corolla 3748 31 5 1 69. Toyota Corona 5719 18 5 1 70. VW Dasher 7140 23 4 1 71. VW Diesel 5397 41 5 1 72. VW Rabbit 4697 25 4 1 73. VW Scirocco 6850 25 4 1 74. Volvo 260 11995 17 5 1
This code will also yield the same output as above.
list if rep78 >= 4 & rep78 != .
We can use if with most Stata commands. Here, we get summary statistics for price for cars with repair histories of 1 or 2. Note the double equal (==) represents IS EQUAL TO and the pipe ( | ) represents OR.
summarize price if rep78 == 1 | rep78 == 2Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- price | 10 5687 3216.375 3667 14500
A simpler way to say this would be…
summarize price if rep78 <= 2Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- price | 10 5687 3216.375 3667 14500
Likewise, we can do this for cars with repair history of 3, 4 or 5.
summarize price if rep78 == 3 | rep78 == 4 | rep78 == 5Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- price | 59 6223.847 2880.454 3291 15906
Additionally, we can use this code to designate a range of values. Here is a summary of price for the values 3 through 5 in rep78.
summarize price if inrange(rep78,3,5)Variable | Obs Mean Std. Dev. Min Max ----------+-------------------------------------------------------- price | 59 6223.847 2880.454 3291 15906
Let’s simplify this by saying rep78 >= 3.
summarize price if rep78 >= 3Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- price | 64 6239.984 2925.843 3291 15906
Did you see the mistake we made? We accidentally included the missing values because we forgot to exclude them. We really needed to say.
summarize price if rep78 >= 3 & !missing(rep78)Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- price | 59 6223.847 2880.454 3291 15906
Taking a random sample
It is also possible to take a simple random sample of your data using the sample command. This information can be found on our STATA FAQ page: How can I draw a random sample of my data?
Summary
Most Stata commands can be followed by if, for example
Summarize if rep78 equals 2
summarize if rep78 == 2
Summarize if rep78 is greater than or equal to 2
summarize if rep78 >= 2
Summarize if rep78 greater than 2
summarize if rep78 > 2
Summarize if rep78 less than or equal to 2
summarize if rep78 <= 2
Summarize if rep78 less than 2
summarize if rep78 <2
Summarize if rep78 not equal to 2
summarize if rep78 != 2
If expressions can be connected with
|
for OR
& for AND
Missing Values
Missing values are represented as '.' and are the highest value possible. Therefore, when values are missing, be careful with commands like
summarize if rep78 > 3summarize if rep78 >= 3summarize if rep78 != 3
to omit missing values, use
summarize if rep78 > 3 & !missing(rep78)summarize if rep78 >= 3 & !missing(rep78)summarize if rep78 != 3 & !missing(rep78)