Version info: Code for this page was tested in Stata 12.1.
Researchers are sometimes interested in studying not just the effects of ethnicity, but the effects of the heterogeneity in ethnic composition. For example, suppose a researcher looked at four ethnicities and had data like this for three different schools.
ID Asian Black Other White 1 30 30 30 30 2 120 0 0 0 3 50 50 10 10
The first school is the most diverse. That is, there is the most spread or dispersion of ethnicities. The second school has no racial dispersion, everyone belongs to a single race. The third school falls somewhere in between no and complete dispersion. One way to quantify dispersion is given by a formula on page 413 of Brewster (1994): [ D = frac{k(N^{2} – sum f^{2}_{i}}{N^{2}(k – 1)} ] here we implement this formula in Stata using their matrix language, Mata.
/*Brewster (1994) page 413 */ /*this is the mata function to calculate dispersion */ /*it requries a list of the variables, and the variable to store results */ mata void dispersion(string scalar varlist, string scalar outvar) { x = st_data(., varlist) k = cols(x) N = rowsum(x) N2 = N :* N dev = x :- (N/k) ss = rowsum(x :* x) d = (k * (N2 - ss)) :/ (N2 * (k - 1)) st_store(., outvar, d) } end
Now the function dispersion exists in Mata and we are ready to use it. For demonstration purposes we will use a convenient built in dataset, even though the example is a little silly.
/*use silly built in dataset*/ sysuse auto /*generate a variable to store results*/ gen results = . /*calculate dispersion of 4 variables and store in 'results'*/ mata dispersion("weight length turn displacement", "results")
The results are automatically stored back into the dataset in the variable specified by the second argument of the function, in this case results. Be careful because the function will override whatever data is there. We can look at the results by listing a few rows.
list results in 1/10 +----------+ | results | |----------| 1. | .26111 | 2. | .2994921 | 3. | .2688744 | 4. | .2868054 | 5. | .3159271 | |----------| 6. | .2886932 | 7. | .4270839 | 8. | .2879096 | 9. | .2710272 | 10. | .2973311 | +----------+
References
- Brewster, Karin. (1994). Race Differences in Sexual Activity Among Adolescent Women: The Role of Neighborhood Characteristics. American Sociological Review, 59(3), pp. 408-424.