1. Use the crime data file that was used in chapter 2 (use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/crime ) and look at a regression model predicting murder from pctmetro, poverty, pcths and single using OLS and make a avplots and a lvr2plot following the regression. Are there any states that look worrisome? Repeat this analysis using regression with robust standard errors and show avplots for the analysis. Repeat the analysis using robust regression and make a manually created lvr2plot. Also run the results using qreg. Compare the results of the different analyses. Look at the weights from the robust regression and comment on the weights.
Answer 1.
First, consider the OLS regression predicting murder from pctmetro,
poverty, pcths and single.
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/crime , clear
(crime data from agresti & finlay - 1997)
regress murder pctmetro poverty pcths single
Source | SS df MS Number of obs = 51 -------------+------------------------------ F( 4, 46) = 37.90 Model | 4406.42207 4 1101.60552 Prob > F = 0.0000 Residual | 1336.89947 46 29.0630319 R-squared = 0.7672 -------------+------------------------------ Adj R-squared = 0.7470 Total | 5743.32154 50 114.866431 Root MSE = 5.391 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0682218 .0380637 1.79 0.080 -.0083964 .14484 poverty | .4380115 .3259862 1.34 0.186 -.2181648 1.094188 pcths | .0243003 .2220237 0.11 0.913 -.4226102 .4712109 single | 3.650532 .4982054 7.33 0.000 2.647697 4.653367 _cons | -45.31188 19.39747 -2.34 0.024 -84.35697 -6.266792 ------------------------------------------------------------------------------
These results suggest that single is the only predictor significantly related to number of murders in a state. Let’s look at the lvr2plot for this analysis. Washington DC looks like it has both a very high leverage and a very high residual.
. lvr2plot, mlabel(state)
. avplots
Let’s consider the same analysis using robust standard errors. The results are largely the same, except that the p value for pctmetro fell from 0.08 to 0.049, which would then make it a significant predictor, however we would be somewhat skeptical of this particular result without further investigation.
regress murder pctmetro poverty pcths single, robust
Regression with robust standard errors Number of obs = 51 F( 4, 46) = 7.20 Prob > F = 0.0001 R-squared = 0.7672 Root MSE = 5.391 ------------------------------------------------------------------------------ | Robust murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0682218 .0337517 2.02 0.049 .0002832 .1361604 poverty | .4380115 .2568971 1.71 0.095 -.0790955 .9551185 pcths | .0243003 .1841403 0.13 0.896 -.3463549 .3949556 single | 3.650532 1.152474 3.17 0.003 1.330723 5.970341 _cons | -45.31188 25.39531 -1.78 0.081 -96.42999 5.806231 ------------------------------------------------------------------------------
Stata allows us to compute the residual for this analysis but will not allow us to compute the leverage (hat) value. So instead of showing a lvr2plot let’s look at the avplots for this analysis.
. avplots , mlabel(state)
As you can see, we still have an observation that sticks out from the rest, and this is Washington DC. This is especially pronounced for the lower right graph for single where DC would seem to have very strong leverage to influence the coefficient for single.
Now, let’s look at the analysis using robust regression and save the weights, calling them rrwt.
rreg murder pctmetro poverty pcths single, genwt(rrwt)
Huber iteration 1: maximum difference in weights = .44857261 Huber iteration 2: maximum difference in weights = .0399983 Biweight iteration 3: maximum difference in weights = .15321379 Biweight iteration 4: maximum difference in weights = .00973214 Robust regression estimates Number of obs = 50 F( 4, 45) = 35.25 Prob > F = 0.0000 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0535439 .0146555 3.65 0.001 .0240262 .0830615 poverty | .182561 .1259505 1.45 0.154 -.0711163 .4362383 pcths | -.2245853 .0863452 -2.60 0.013 -.3984936 -.0506771 single | 1.392942 .2355845 5.91 0.000 .9184503 1.867434 _cons | 2.888033 7.945302 0.36 0.718 -13.11463 18.89069 ------------------------------------------------------------------------------
If you try the avplots command, this command is not available after rreg and the lvr2plot is not available either. But we can manually create the residual and hat values and create an lvr2plot of our own, see below.
predict r, r predict h, hat generate r2=r^2 sum r2 <output omitted> replace r2 = r2/r(sum) summarize r2 <output omitted> local rm = r(mean) summarize h <output omitted> local hm = r(mean) graph twoway scatter h r2 if state ~= "dc", yline(`hm') xline(`rm') mlabel(state) xlabel(0(.005).025)
As you see above, using the robust regression, none of the observations are jointly high in leverage and their residual values. Let’s recap the regress results and the rreg results below and compare them.
regress murder pctmetro poverty pcths single
Source | SS df MS Number of obs = 51 -------------+------------------------------ F( 4, 46) = 37.90 Model | 4406.42207 4 1101.60552 Prob > F = 0.0000 Residual | 1336.89947 46 29.0630319 R-squared = 0.7672 -------------+------------------------------ Adj R-squared = 0.7470 Total | 5743.32154 50 114.866431 Root MSE = 5.391 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0682218 .0380637 1.79 0.080 -.0083964 .14484 poverty | .4380115 .3259862 1.34 0.186 -.2181648 1.094188 pcths | .0243003 .2220237 0.11 0.913 -.4226102 .4712109 single | 3.650532 .4982054 7.33 0.000 2.647697 4.653367 _cons | -45.31188 19.39747 -2.34 0.024 -84.35697 -6.266792 ------------------------------------------------------------------------------
rreg murder pctmetro poverty pcths single
Huber iteration 1: maximum difference in weights = .44857261 Huber iteration 2: maximum difference in weights = .0399983 Biweight iteration 3: maximum difference in weights = .15321379 Biweight iteration 4: maximum difference in weights = .00973214 Robust regression estimates Number of obs = 50 F( 4, 45) = 35.25 Prob > F = 0.0000 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0535439 .0146555 3.65 0.001 .0240262 .0830615 poverty | .182561 .1259505 1.45 0.154 -.0711163 .4362383 pcths | -.2245853 .0863452 -2.60 0.013 -.3984936 -.0506771 single | 1.392942 .2355845 5.91 0.000 .9184503 1.867434 _cons | 2.888033 7.945302 0.36 0.718 -13.11463 18.89069 ------------------------------------------------------------------------------
The results are consistent for poverty and for single, where poverty was not significant in both analyses and single was significant in both analyses. However, the results for pctmetro and pcths were both not significant in the OLS analysis and were significant in the robust regression anlaysis.
Let’s look at the weights used in the robust regression to further understand why the results were so different. Note that the weight for dc is . meaning that it was eliminated from the analysis entirely (because it had such a high residual). Also, ri was weighted by less than half.
hilo rrwt state
10 lowest and highest observations on rrwt rrwt state 46982663 ri 62949383 md 716977 nm 73472243 ma 74565543 mo 75750112 la 79708217 ky 82324958 ks 82552144 de 82728266 il rrwt state 99592844 sd 99639177 pa 99799356 fl 99811845 vt 99838103 ga 99863411 nh 99981867 wy 99986937 nd 99991851 ok dc
In our analyses in chapter 2 (involving different variables) we found dc to be a very serious outlier and decided that it should be excluded because it is not a state. If we investigated further into these variables we may reach the same conclusion and decide that dc should be excluded. If we did, we could try using OLS regression like this. These results are quite similar to the rreg results. The benefits of rreg is that it deals not only with the serious problems (like dc being a very bad outlier) but also minor problems as well.
regress murder pctmetro poverty pcths single if state != "dc"
Source | SS df MS Number of obs = 50 -------------+------------------------------ F( 4, 45) = 39.88 Model | 606.611746 4 151.652936 Prob > F = 0.0000 Residual | 171.137027 45 3.80304505 R-squared = 0.7800 -------------+------------------------------ Adj R-squared = 0.7604 Total | 777.748773 49 15.8724239 Root MSE = 1.9501 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0534333 .013795 3.87 0.000 .0256488 .0812178 poverty | .2237151 .1185554 1.89 0.066 -.0150679 .462498 pcths | -.1938711 .0812756 -2.39 0.021 -.3575685 -.0301737 single | 1.388337 .2217525 6.26 0.000 .9417051 1.83497 _cons | -.0044014 7.478803 -0.00 1.000 -15.06748 15.05868 ------------------------------------------------------------------------------
Let’s try running the results using qreg and compare them with rreg.
qreg murder pctmetro poverty pcths single
Iteration 1: WLS sum of weighted deviations = 187.90652 Iteration 1: sum of abs. weighted deviations = 177.16784 Iteration 2: sum of abs. weighted deviations = 167.01302 Iteration 3: sum of abs. weighted deviations = 128.40282 Iteration 4: sum of abs. weighted deviations = 125.28249 Iteration 5: sum of abs. weighted deviations = 124.226 Iteration 6: sum of abs. weighted deviations = 122.93248 Iteration 7: sum of abs. weighted deviations = 122.6427 Iteration 8: sum of abs. weighted deviations = 122.40488 Iteration 9: sum of abs. weighted deviations = 122.03476 Iteration 10: sum of abs. weighted deviations = 122.03096 Median regression Number of obs = 51 Raw sum of deviations 235.3 (about 6.8000002) Min sum of deviations 122.031 Pseudo R2 = 0.4814 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0527879 .0226177 2.33 0.024 .0072608 .098315 poverty | .0908506 .1831176 0.50 0.622 -.2777461 .4594473 pcths | -.2686652 .1284197 -2.09 0.042 -.5271606 -.0101697 single | 1.796151 .2859057 6.28 0.000 1.220652 2.371649 _cons | 3.524669 11.34322 0.31 0.757 -19.30806 26.35739 ------------------------------------------------------------------------------
rreg murder pctmetro poverty pcths single
Huber iteration 1: maximum difference in weights = .44857261 Huber iteration 2: maximum difference in weights = .0399983 Biweight iteration 3: maximum difference in weights = .15321379 Biweight iteration 4: maximum difference in weights = .00973214 Robust regression estimates Number of obs = 50 F( 4, 45) = 35.25 Prob > F = 0.0000 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pctmetro | .0535439 .0146555 3.65 0.001 .0240262 .0830615 poverty | .182561 .1259505 1.45 0.154 -.0711163 .4362383 pcths | -.2245853 .0863452 -2.60 0.013 -.3984936 -.0506771 single | 1.392942 .2355845 5.91 0.000 .9184503 1.867434 _cons | 2.888033 7.945302 0.36 0.718 -13.11463 18.89069 ------------------------------------------------------------------------------
While the coefficients do not always match up, the variables that were significant in the qreg are also significant in the rreg and likewise for the non-significant variables. Even though these techniques use different strategies for resisting the influence of very deviant observations, they both arrive at the same conclusions regarding which variables are significantly related to murder, although they do not always agree in the strength of the relationship, i.e. the size of the coefficients.
2. Using the elemapi2 data file (use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2 ) pretend that 550 is the lowest score that a school could achieve on api00, i.e., create a new variable with the api00 score and recode it such that any score of 550 or below becomes 550. Use meals, ell and emer to predict api scores using 1) OLS to predict the original api score (before recoding) 2) OLS to predict the recoded score where 550 was the lowest value, and 3) using tobit to predict the recoded api score indicating the lowest value is 550. Compare the results of these analyses.
Answer 2.
First, we will use the elemapi2 data file and create the recoded version
of the api score where the lowest value is 550. We will call this value api00x.
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2 , clear gen api00x = api00 replace api00x = 550 if api00 <= 550 (122 real changes made)
Analysis 1. Now, we will run an OLS regression on the un-recoded version of api.
regress api00 meals ell emer
Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 673.00 Model | 6749782.75 3 2249927.58 Prob > F = 0.0000 Residual | 1323889.25 396 3343.15467 R-squared = 0.8360 -------------+------------------------------ Adj R-squared = 0.8348 Total | 8073672.00 399 20234.7669 Root MSE = 57.82 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.159189 .1497371 -21.10 0.000 -3.453568 -2.864809 ell | -.9098732 .1846442 -4.93 0.000 -1.272878 -.5468678 emer | -1.573496 .293112 -5.37 0.000 -2.149746 -.9972456 _cons | 886.7033 6.25976 141.65 0.000 874.3967 899.0098 ------------------------------------------------------------------------------
Analysis 2. Now, we run an OLS regression on the recoded version of api.
regress api00x meals ell emerSource | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 682.88 Model | 4567355.46 3 1522451.82 Prob > F = 0.0000 Residual | 882862.941 396 2229.45187 R-squared = 0.8380 -------------+------------------------------ Adj R-squared = 0.8368 Total | 5450218.40 399 13659.6952 Root MSE = 47.217 ------------------------------------------------------------------------------ api00x | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.010788 .1222786 -24.62 0.000 -3.251184 -2.770392 ell | -.3034092 .1507844 -2.01 0.045 -.5998472 -.0069713 emer | -.7484733 .2393616 -3.13 0.002 -1.219052 -.277895 _cons | 869.31 5.111854 170.06 0.000 859.2602 879.3597 ------------------------------------------------------------------------------
Analysis 3. And we use tobit to perform the analysis indicating that the lowest value possible was 550.
tobit api00x meals ell emer , ll(550)
Tobit estimates Number of obs = 400 LR chi2(3) = 660.74 Prob > chi2 = 0.0000 Log likelihood = -1581.8117 Pseudo R2 = 0.1728 ------------------------------------------------------------------------------ api00x | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.145065 .1595799 -19.71 0.000 -3.458792 -2.831337 ell | -.8633529 .212474 -4.06 0.000 -1.281068 -.4456381 emer | -1.470878 .3361215 -4.38 0.000 -2.131678 -.8100772 _cons | 885.2395 6.372871 138.91 0.000 872.7107 897.7683 -------------+---------------------------------------------------------------- _se | 57.12718 2.473494 (Ancillary parameter) ------------------------------------------------------------------------------ Obs. summary: 122 left-censored observations at api00x <=550 278 uncensored observations
First, let’s compare analysis 1 and 2. When the range in api was restricted in analysis 2, the size of the coefficients dropped due to the restriction in range of the api scores. For example, the coefficient for ell dropped from -.9 to -.3 and its significance level changed to 0.045 (nearly not significant from being quite significant). Let’s see how well the tobit analysis compensated for the restriction in range by comparing analysis #1 and #3. The coefficients are quite similar in these two analyses. The standard errors are slightly larger in the tobit analysis leading the t values to be somewhat smaller. Nevertheless, the tobit estimates are much more on target than the second OLS analysis on the recoded data.
3. Using the elemapi2 data file (use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2 ) pretend that only schools with api scores of 550 or higher were included in the sample. Use meals ell and emer to predict api scores using 1) OLS to predict api from the full set of observations, 2) OLS to predict api using just the observations with api scores of 550 or higher, and 3) using truncreg to predict api using just the observations where api is 550 or higher. Compare the results of these analyses.
Answer 3.
First, we use the elemapi2 data file and run the analysis on the complete
data.
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear
Analysis 1 using all of the data.
regress api00 meals ell emer
Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 673.00 Model | 6749782.75 3 2249927.58 Prob > F = 0.0000 Residual | 1323889.25 396 3343.15467 R-squared = 0.8360 -------------+------------------------------ Adj R-squared = 0.8348 Total | 8073672.00 399 20234.7669 Root MSE = 57.82 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.159189 .1497371 -21.10 0.000 -3.453568 -2.864809 ell | -.9098732 .1846442 -4.93 0.000 -1.272878 -.5468678 emer | -1.573496 .293112 -5.37 0.000 -2.149746 -.9972456 _cons | 886.7033 6.25976 141.65 0.000 874.3967 899.0098 ------------------------------------------------------------------------------
Now let’s keep just the schools with api scores of 550 or higher for the next 2 analyses.
keep if api00 >= 550
(122 observations deleted)
Analysis 2 using OLS on just the schools with api scores of 550 or higher.
regress api00 meals ell emer
Source | SS df MS Number of obs = 278 -------------+------------------------------ F( 3, 274) = 292.55 Model | 2268727.43 3 756242.478 Prob > F = 0.0000 Residual | 708297.044 274 2585.02571 R-squared = 0.7621 -------------+------------------------------ Adj R-squared = 0.7595 Total | 2977024.48 277 10747.3808 Root MSE = 50.843 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -2.798288 .1600331 -17.49 0.000 -3.113339 -2.483238 ell | -.3584496 .2315111 -1.55 0.123 -.8142161 .0973169 emer | -.9417814 .3547208 -2.65 0.008 -1.640106 -.2434569 _cons | 868.222 5.880858 147.64 0.000 856.6446 879.7994 ------------------------------------------------------------------------------
Analysis 3 using truncreg on just the schools with api scores of 550 or higher.
truncreg api00 meals ell emer , ll(550)
(note: 0 obs. truncated) Fitting full model: Iteration 0: log likelihood = -1467.4296 Iteration 1: log likelihood = -1460.6163 Iteration 2: log likelihood = -1460.3638 Iteration 3: log likelihood = -1460.3636 Iteration 4: log likelihood = -1460.3636 Truncated regression Limit: lower = 550 Number of obs = 278 upper = +inf Wald chi2(3) = 634.48 Log likelihood = -1460.3636 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | meals | -2.90758 .1872438 -15.53 0.000 -3.274571 -2.540589 ell | -.8212468 .2983573 -2.75 0.006 -1.406016 -.2364771 emer | -1.446235 .4549632 -3.18 0.001 -2.337946 -.5545233 _cons | 879.4212 6.595712 133.33 0.000 866.4939 892.3486 -------------+---------------------------------------------------------------- sigma | _cons | 53.34897 2.545858 20.96 0.000 48.35918 58.33876 ------------------------------------------------------------------------------
Let’s first compare the results of analysis 1 with analysis 2. When the schools with api scores of less than 550 are omitted, the coefficient for ell drops from -.9 to .35 and becomes no longer statistically significant. The coefficients for meals and emer remain significant although they both drop as well.
Now, let’s compare analysis 3 using truncreg with the original OLS analysis of the complete data. In both of these analyses, all of the variables are significant and the coefficients are quite similar, although the standard errors are larger in the truncreg. The truncreg did a pretty good job of showing us what the coefficients were in the complete sample based just on the restricted sample.
4. Using the hsb2 data file (use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/hsb2 ) predict read from science, socst, math and write. Use the testparm and test commands to test the equality of the coefficients for science, socst and math. Use cnsreg to estimate a model where these three parameters are equal.
Answer 4.
We start by using the hsb2 data file.
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/hsb2 , clear
(highschool and beyond (200 cases))
We first run an ordinary regression predicting read from science, socst, math and write.
regress read science socst math write
Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 69.74 Model | 12312.7853 4 3078.19634 Prob > F = 0.0000 Residual | 8606.63466 195 44.136588 R-squared = 0.5886 -------------+------------------------------ Adj R-squared = 0.5801 Total | 20919.42 199 105.122714 Root MSE = 6.6435 ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2736751 .064369 4.25 0.000 .1467263 .4006238 socst | .273267 .0574246 4.76 0.000 .160014 .38652 math | .3028976 .072581 4.17 0.000 .1597532 .446042 write | .1104172 .0713398 1.55 0.123 -.0302795 .2511139 _cons | 1.946078 3.087346 0.63 0.529 -4.142797 8.034954 ------------------------------------------------------------------------------
We use the testparm command to test that the coefficients for science, socst and math are equal.
testparm science socst math, equal
( 1) - science + socst = 0.0 ( 2) - science + math = 0.0 F( 2, 195) = 0.05 Prob > F = 0.9554
We can also use the test command to test that the coefficients for science, socst and math are equal.
test science=socst
( 1) science - socst = 0.0 F( 1, 195) = 0.00 Prob > F = 0.9964
test socst=math, accum
( 1) science - socst = 0.0 ( 2) socst - math = 0.0 F( 2, 195) = 0.05 Prob > F = 0.9554
We now constrain these three coefficients to be equal.
constraint define 1 science = socst constraint define 2 socst = math
And we use cnsreg to estimate the model with these constraints in place.
cnsreg read science socst math write, c(1 2) Constrained linear regression Number of obs = 200 F( 2, 197) = 140.80 Prob > F = 0.0000 Root MSE = 6.6113 ( 1) science - socst = 0.0 ( 2) socst - math = 0.0 ------------------------------------------------------------------------------ read | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2828596 .0268291 10.54 0.000 .2299505 .3357687 socst | .2828596 .0268291 10.54 0.000 .2299505 .3357687 math | .2828596 .0268291 10.54 0.000 .2299505 .3357687 write | .1106022 .0708452 1.56 0.120 -.02911 .2503145 _cons | 2.012299 3.061703 0.66 0.512 -4.025622 8.05022 ------------------------------------------------------------------------------
5. Using the elemapi2 data file (use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2 ) consider the following 2 regression equations.
api00 = meals ell emer api99 = meals ell emer
Estimate the coefficients for these predictors in predicting api00 and api99 taking into account the non-independence of the schools. Test the overall contribution of each of the predictors in jointly predicting api scores in these two years. Test whether the contribution of emer is the same for api00 and api99.
Answer 5.
First, let’s use the elemapi2 data file.
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear
Next, let’s analysze these equations separately.
regress api00 meals ell emer
Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 673.00 Model | 6749782.75 3 2249927.58 Prob > F = 0.0000 Residual | 1323889.25 396 3343.15467 R-squared = 0.8360 -------------+------------------------------ Adj R-squared = 0.8348 Total | 8073672.00 399 20234.7669 Root MSE = 57.82 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.159189 .1497371 -21.10 0.000 -3.453568 -2.864809 ell | -.9098732 .1846442 -4.93 0.000 -1.272878 -.5468678 emer | -1.573496 .293112 -5.37 0.000 -2.149746 -.9972456 _cons | 886.7033 6.25976 141.65 0.000 874.3967 899.0098 ------------------------------------------------------------------------------
regress api99 meals ell emer
Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 716.31 Model | 7293890.24 3 2431296.75 Prob > F = 0.0000 Residual | 1344092.70 396 3394.17349 R-squared = 0.8444 -------------+------------------------------ Adj R-squared = 0.8432 Total | 8637982.94 399 21649.08 Root MSE = 58.26 ------------------------------------------------------------------------------ api99 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- meals | -3.412388 .1508754 -22.62 0.000 -3.709004 -3.115771 ell | -.793822 .1860477 -4.27 0.000 -1.159587 -.4280573 emer | -1.516305 .2953401 -5.13 0.000 -2.096936 -.9356748 _cons | 860.191 6.307343 136.38 0.000 847.7909 872.591 ------------------------------------------------------------------------------
Now, let’s analyze them using sureg that takes into account the non-independence of these equations.
sureg (api00 api99 = meals ell emer)
Seemingly unrelated regression ---------------------------------------------------------------------- Equation Obs Parms RMSE "R-sq" chi2 P ---------------------------------------------------------------------- api00 400 3 57.53019 0.8360 2039.38 0.0000 api99 400 3 57.96751 0.8444 2170.651 0.0000 ---------------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- api00 | meals | -3.159189 .1489866 -21.20 0.000 -3.451197 -2.86718 ell | -.9098732 .1837186 -4.95 0.000 -1.269955 -.5497913 emer | -1.573496 .2916428 -5.40 0.000 -2.145105 -1.001886 _cons | 886.7033 6.228382 142.36 0.000 874.4959 898.9107 -------------+---------------------------------------------------------------- api99 | meals | -3.412388 .1501191 -22.73 0.000 -3.706616 -3.11816 ell | -.793822 .1851151 -4.29 0.000 -1.156641 -.431003 emer | -1.516305 .2938597 -5.16 0.000 -2.09226 -.9403509 _cons | 860.191 6.275727 137.07 0.000 847.8908 872.4912 ------------------------------------------------------------------------------
We can test the contribution of meals ell and emer as shown below.
test meals
( 1) [api00]meals = 0.0 ( 2) [api99]meals = 0.0 chi2( 2) = 518.30 Prob > chi2 = 0.0000 test ell
( 1) [api00]ell = 0.0 ( 2) [api99]ell = 0.0 chi2( 2) = 24.80 Prob > chi2 = 0.0000
test emer
( 1) [api00]emer = 0.0 ( 2) [api99]emer = 0.0 chi2( 2) = 29.48 Prob > chi2 = 0.0000
We can test whether the coefficients for emer were the same in predicting api00 and api99 as shown below.
test [api00]emer = [api99]emer ( 1) [api00]emer - [api99]emer = 0.0 chi2( 1) = 0.21 Prob > chi2 = 0.6456
We can also test the contribution of meals ell and emer using more traditional multivariate tests using the mvreg and mvtest commands as shown below.
mvreg api00 api99 = meals ell emer Equation Obs Parms RMSE "R-sq" F P ---------------------------------------------------------------------- api00 400 4 57.82002 0.8360 672.9954 0.0000 api99 400 4 58.25954 0.8444 716.3148 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- api00 | meals | -3.159189 .1497371 -21.10 0.000 -3.453568 -2.864809 ell | -.9098732 .1846442 -4.93 0.000 -1.272878 -.5468678 emer | -1.573496 .293112 -5.37 0.000 -2.149746 -.9972456 _cons | 886.7033 6.25976 141.65 0.000 874.3967 899.0098 -------------+---------------------------------------------------------------- api99 | meals | -3.412388 .1508754 -22.62 0.000 -3.709004 -3.115771 ell | -.793822 .1860477 -4.27 0.000 -1.159587 -.4280573 emer | -1.516305 .2953401 -5.13 0.000 -2.096936 -.9356748 _cons | 860.191 6.307343 136.38 0.000 847.7909 872.591 ------------------------------------------------------------------------------
Below we show the multivariate tests for meals ell and for emer.
mvtest meals MULTIVARIATE TESTS OF SIGNIFICANCE Multivariate Test Criteria and Exact F Statistics for the Hypothesis of no Overall "meals" Effect(s) S=1 M=0 N=196.5 Test Value F Num DF Den DF Pr > F Wilks' Lambda 0.43558762 255.9105 2 395.0000 0.0000 Pillai's Trace 0.56441238 255.9105 2 395.0000 0.0000 Hotelling-Lawley Trace 1.29574936 255.9105 2 395.0000 0.0000 mvtest ell MULTIVARIATE TESTS OF SIGNIFICANCE Multivariate Test Criteria and Exact F Statistics for the Hypothesis of no Overall "ell" Effect(s) S=1 M=0 N=196.5 Test Value F Num DF Den DF Pr > F Wilks' Lambda 0.94161436 12.2462 2 395.0000 0.0000 Pillai's Trace 0.05838564 12.2462 2 395.0000 0.0000 Hotelling-Lawley Trace 0.06200590 12.2462 2 395.0000 0.0000 mvtest emer MULTIVARIATE TESTS OF SIGNIFICANCE Multivariate Test Criteria and Exact F Statistics for the Hypothesis of no Overall "emer" Effect(s) S=1 M=0 N=196.5 Test Value F Num DF Den DF Pr > F Wilks' Lambda 0.93136794 14.5537 2 395.0000 0.0000 Pillai's Trace 0.06863206 14.5537 2 395.0000 0.0000 Hotelling-Lawley Trace 0.07368952 14.5537 2 395.0000 0.0000