This module shows the general structure of Stata commands. We will demonstrate this using summarize as an example, although this general structure applies to most Stata commands.
Note: This code was tested in Stata 12.
Let’s first use the auto data file.
sysuse auto
As you have seen, we can type summarize and it will give us summary statistics for all of the variables in the data file.
summarizeVariable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- make | 0 price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 hdroom | 74 2.993243 .8459948 1.5 5 trunk | 74 13.75676 4.277404 5 23 weight | 74 3019.459 777.1936 1760 4840 length | 74 187.9324 22.26634 142 233 turn | 74 39.64865 4.399354 31 51 displ | 74 197.2973 91.83722 79 425 gratio | 74 3.014865 .4562871 2.19 3.89 foreign | 74 .2972973 .4601885 0 1
It is also possible to obtain means for specific variables. For example, below we get summary statistics just for mpg and price.
summarize mpg priceVariable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- mpg | 74 21.2973 5.785503 12 41 price | 74 6165.257 2949.496 3291 15906
We could further tell Stata to limit the summary statistics to just foreign cars by adding an if qualifier.
summarize mpg price if (foreign == 1)Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- mpg | 22 24.77273 6.611187 14 41 price | 22 6384.682 2621.915 3748 12990
The if qualifier can contain more than one condition. Here, we ask for summary statistics for the foreign cars which get less than 30 miles per gallon.
summarize mpg price if foreign == 1 & mpg <30Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- mpg | 17 21.94118 3.896643 14 28 price | 17 6996.235 2674.552 3895 12990
We can use the detail option to ask Stata to give us more detail in the summary statistics. Notice that the detail option goes after the comma. If the comma were omitted, Stata would give an error.
summarize mpg price if foreign == 1 & mpg <30 , detailmpg ------------------------------------------------------------- Percentiles Smallest 1% 14 14 5% 14 17 10% 17 17 Obs 17 25% 18 18 Sum of Wgt. 17 50% 23 Mean 21.94118 Largest Std. Dev. 3.896643 75% 25 25 90% 26 25 Variance 15.18382 95% 28 26 Skewness -.4901235 99% 28 28 Kurtosis 2.201759 price ------------------------------------------------------------- Percentiles Smallest 1% 3895 3895 5% 3895 4296 10% 4296 4499 Obs 17 25% 5079 4697 Sum of Wgt. 17 50% 6229 Mean 6996.235 Largest Std. Dev. 2674.552 75% 8129 9690 90% 11995 9735 Variance 7153229 95% 12990 11995 Skewness .9818272 99% 12990 12990 Kurtosis 2.930843
Note that even though we built these parts up one at a time, they don’t have to go together. Let’s look at some other forms of the summarize command.
You can tell Stata which observation numbers you want using the in qualifier. Here we ask for summaries of observations 1 to 10. This is useful if you have a big data file and want to try out a command on a subset of observations.
summarize in 1/10Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- make | 0 price | 10 5517.4 2063.518 3799 10372 mpg | 10 19.5 3.27448 15 26 rep78 | 8 3.125 .3535534 3 4 hdroom | 10 3.3 .7527727 2 4.5 trunk | 10 14.7 3.88873 10 21 weight | 10 3271 558.3796 2230 4080 length | 10 194 19.32759 168 222 turn | 10 40.2 3.259175 34 43 displ | 10 223.9 71.77503 121 350 gratio | 10 2.907 .3225264 2.41 3.58 foreign | 10 0 0 0 0
Also, recall that you can ask Stata to perform summaries for foreign and domestic cars separately using by, as shown below.
sort foreignby foreign: summarize-> foreign= 0 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- make | 0 price | 52 6072.423 3097.104 3291 15906 mpg | 52 19.82692 4.743297 12 34 rep78 | 48 3.020833 .837666 1 5 hdroom | 52 3.153846 .9157578 1.5 5 trunk | 52 14.75 4.306288 7 23 weight | 52 3317.115 695.3637 1800 4840 length | 52 196.1346 20.04605 147 233 turn | 52 41.44231 3.967582 31 51 displ | 52 233.7115 85.26299 86 425 gratio | 52 2.806538 .3359556 2.19 3.58 foreign | 52 0 0 0 0 -> foreign= 1 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- make | 0 price | 22 6384.682 2621.915 3748 12990 mpg | 22 24.77273 6.611187 14 41 rep78 | 21 4.285714 .7171372 3 5 hdroom | 22 2.613636 .4862837 1.5 3.5 trunk | 22 11.40909 3.216906 5 16 weight | 22 2315.909 433.0035 1760 3420 length | 22 168.5455 13.68255 142 193 turn | 22 35.40909 1.501082 32 38 displ | 22 111.2273 24.88054 79 163 gratio | 22 3.507273 .2969076 2.98 3.89 foreign | 22 1 0 1 1
Let’s review all those pieces.
A command can be preceded with a by prefix, as shown below.
by foreign: summarize
There are many parts that can come after a command. They are each presented separately below. For example, summarize followed by the names of variables.
summarize mpg price
summarize with in specifying a range of records to be summarized.
summarize in 1/10
summarize with simple if specifying records to summarize.
summarize if foreign == 1
summarize with complex if specifying records to summarize.
summarize if foreign == 1 & mpg > 30
summarize followed by option(s).
summarize , detail
So, putting it all together, the general syntax of the summarize command can be described as:
[by varlist:] summarize [varlist] [in range] [if exp] , [options]
Understanding the overall syntax of Stata commands helps you remember them and use them more effectively, and it also aids you understand the help files in Stata. All the extra stuff about by, if and in could be confusing. Let’s have a look at the help file for summarize. It makes more sense knowing what the by, if and in parts mean.
help summarize------------------------------------------------------------------------------- help for summarize (manual: [R] summarize) ------------------------------------------------------------------------------- Summary statistics ------------------ [by varlist:] summarize [varlist] [weight] [if exp] [in range] [, { detail | meanonly } format ]