Figure 12.1, figure 12.2 and table 12.1, page 297-299 on data file ornstein. We skip the section on confidence envelopes. We only have the usual normal quantile plots without the confidence envelopes.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/ornstein, clear tab sector, gen(sect) /*generate dummy variables*/ (Output omitted here) tab nation, gen(nat)/*generate dummy variables*/ (Output omitted here) gen asset1=sqrt(assets)/*generate a new regressor*/ regress intrlcks asset1 nat2-nat4 sect1-sect5 sect7-sect10 Source | SS df MS Number of obs = 248 ---------+------------------------------ F( 13, 234) = 34.11 Model | 41816.5529 13 3216.65791 Prob > F = 0.0000 Residual | 22069.8342 234 94.3155309 R-squared = 0.6545 ---------+------------------------------ Adj R-squared = 0.6354 Total | 63886.3871 247 258.64934 Root MSE = 9.7116 ------------------------------------------------------------------------------ intrlcks | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- asset1 | .2517892 .0185222 13.594 0.000 .2152976 .2882808 nat2 | -1.158915 2.664005 -0.435 0.664 -6.407413 4.089584 nat3 | -4.444009 2.649276 -1.677 0.095 -9.66349 .7754719 nat4 | -8.089053 1.481003 -5.462 0.000 -11.00686 -5.171248 sect1 | -1.199539 2.04038 -0.588 0.557 -5.219402 2.820323 sect2 | -14.37594 5.576992 -2.578 0.011 -25.36347 -3.388413 sect3 | -5.12563 4.698778 -1.091 0.276 -14.38294 4.131683 sect4 | -5.698507 2.925712 -1.948 0.053 -11.46261 .065596 sect5 | -2.430237 4.014109 -0.605 0.545 -10.33865 5.478175 sect7 | -.8668517 2.63433 -0.329 0.742 -6.056886 4.323183 sect8 | .3422809 2.012105 0.170 0.865 -3.621874 4.306436 sect9 | -.3810376 2.819743 -0.135 0.893 -5.936365 5.17429 sect10 | 5.151303 2.682082 1.921 0.056 -.1328104 10.43542 _cons | 4.190453 1.846039 2.270 0.024 .5534732 7.827433 ------------------------------------------------------------------------------ predict student, rstudent qnorm student
Figure 12.1 (b) on page 297.
kdensity student, ylabel(0(.25).75)
gen inlck=sqrt(intrlcks+1) regress inlck asset1 nat2-nat4 sect1-sect5 sect7-sect10 Source | SS df MS Number of obs = 248 ---------+------------------------------ F( 13, 234) = 24.89 Model | 477.296638 13 36.715126 Prob > F = 0.0000 Residual | 345.199855 234 1.47521306 R-squared = 0.5803 ---------+------------------------------ Adj R-squared = 0.5570 Total | 822.496493 247 3.32994531 Root MSE = 1.2146 ------------------------------------------------------------------------------ inlck | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- asset1 | .0260108 .0023165 11.229 0.000 .021447 .0305747 nat2 | -.1139589 .3331737 -0.342 0.733 -.7703624 .5424445 nat3 | -.5266014 .3313317 -1.589 0.113 -1.179376 .1261729 nat4 | -1.105112 .1852217 -5.966 0.000 -1.470027 -.740197 sect1 | -.0567192 .2551801 -0.222 0.824 -.5594633 .4460249 sect2 | -2.250759 .6974864 -3.227 0.001 -3.624915 -.8766039 sect3 | -.7399749 .5876526 -1.259 0.209 -1.897741 .417791 sect4 | -.0880438 .3659042 -0.241 0.810 -.8089313 .6328437 sect5 | -.2453168 .5020245 -0.489 0.626 -1.234382 .7437487 sect7 | .1479077 .3294624 0.449 0.654 -.5011839 .7969993 sect8 | .3562041 .2516439 1.416 0.158 -.139573 .8519811 sect9 | .3540146 .3526512 1.004 0.316 -.3407624 1.048792 sect10 | .7860363 .3354345 2.343 0.020 .1251788 1.446894 _cons | 2.329308 .2308748 10.089 0.000 1.874449 2.784167 ------------------------------------------------------------------------------ predict student1, rstudent qnorm student1, ylabel(-5(2.5)5) xlabel(-3(1)3)
Figure 12.2 (b), page 299.
kdensity student1, xlabel(-3(1)3) ylabel(0 .2 .4)
Remark: Combining the results from the two regression procedures, we get the result for Table 12.1, page 298.
Figure 12.3 (a) on page 303 on data file ornstein.
regress intrlcks asset1 nat2-nat4 sect1-sect5 sect7-sect10 (Output omitted here.) predict fitted1, xb /*use student residuals from the first regression in previous section*/ graph twoway scatter student fitted1, yline(0) ylabel(-4 0 4) xlabel(-20(20)100)
Figure 12.3 (b) on page 303.
gen logfit=log10(2+fitted1) gen logres = log10(abs(student)) graph twoway (scatter logres logfit)(lfit logres logfit), ylabel(-2(1)1)
regress inlck asset1 nat2-nat4 sect1-sect5 sect7-sect10 (Output omitted here) predict fitted2, xb gen logfit2=log10(fitted2) gen logres2=log10(abs(student1)) graph twoway (scatter logres2 logfit2)(lfit logres2 logfit2), ylabel(-3(1)1) xlabel(0(.2)1.2)
Figure 12.6 (a), (b) and (c) on page 312. Notice that procedure cprplot gives partial residual plot and we use the lowess option to get the nonparametric-regression smoothing shown in the book.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/prestige, clear (From Fox, Applied Regression Analysis. Use 'notes' command for source of data) regress prestige percwomn educat income (Output omitted here.) cprplot educat, lowess ylabel(0(25)100) xlabel(5(2.5)17.5)
cprplot income, lowess ylabel(-25(25)50) xlabel(0(10000)30000)
cprplot percwomn, lowess ylabel(-20 0 20) xlabel(0(25)100)
Formula in the middle of page 313 still using the data file prestige.
gen loginc=log10(income)/log10(2) gen edu2=educat*educat gen edu3=edu2*educat gen w2=percwomn*percwomn regress prestige loginc percwomn w2 educat edu2 edu3 Source | SS df MS Number of obs = 102 ---------+------------------------------ F( 6, 95) = 94.46 Model | 25603.6492 6 4267.27487 Prob > F = 0.0000 Residual | 4291.77689 95 45.1765988 R-squared = 0.8564 ---------+------------------------------ Adj R-squared = 0.8474 Total | 29895.4261 101 295.994318 Root MSE = 6.7214 ------------------------------------------------------------------------------ prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- loginc | 8.783331 1.272748 6.901 0.000 6.256608 11.31005 percwomn | -.179323 .085088 -2.108 0.038 -.3482439 -.010402 w2 | .0025001 .0009245 2.704 0.008 .0006648 .0043354 educat | -29.92002 15.25152 -1.962 0.053 -60.19811 .3580668 edu2 | 2.915935 1.414413 2.062 0.042 .1079704 5.723901 edu3 | -.0806755 .0422082 -1.911 0.059 -.1644694 .0031185 _cons | 20.83852 56.89955 0.366 0.715 -92.12136 133.7984 ------------------------------------------------------------------------------ predict res2, r summarize income percwomn /*get mean for income and percwomn*/ Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- income | 102 6797.902 4245.922 611 25879 percwomn | 102 28.97902 31.72493 0 97.51 gen pm=20.8+8.78*log10(6797.902)/log10(2)-0.179*28.979+0.0025*28.979*28.979-29.9*educat+2.91*edu2-.0807*edu3 gen pmr=res2 + pm graph twoway scatter pmr pm educat, connect(i l) sort msymbol(O i O)
Table 12.2 on page 319.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/vocab, clear (From Fox, Applied Regression Analysis. Use 'notes' command for source of data) regress vocab educ Source | SS df MS Number of obs = 968 ---------+------------------------------ F( 1, 966) = 318.92 Model | 1175.11129 1 1175.11129 Prob > F = 0.0000 Residual | 3559.41351 966 3.68469307 R-squared = 0.2482 ---------+------------------------------ Adj R-squared = 0.2474 Total | 4734.52479 967 4.89609596 Root MSE = 1.9196 (Further output omitted.) xi: regress vocab i.educ i.educ Ieduc_0-20 (naturally coded; Ieduc_0 omitted) Source | SS df MS Number of obs = 968 ---------+------------------------------ F( 19, 948) = 18.13 Model | 1261.69388 19 66.4049413 Prob > F = 0.0000 Residual | 3472.83091 948 3.66332374 R-squared = 0.2665 ---------+------------------------------ Adj R-squared = 0.2518 Total | 4734.52479 967 4.89609596 Root MSE = 1.914 (Further output omitted)
With the results from these two regressions, we can calculate the nonlinear effect as follows.
display 1261.69399-1175.11129/*Sum of Squares for nonlinear effect*/ 86.5827 display (86.57/18)/(3472.8/948)/*F-value*/ 1.3128753 display fprob(18, 948, 1.31)/*p-value*/ .17255134
The first part is on Box-Cox transformation on the dependent variable using the data file ornstein, calculation on page 323 and 324.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/ornstein, clear gen dep=intrlcks+1 /*This is the dependent variable used through the example.*/ gen asset1=sqrt(assets) /*regressor obtained from independent variable assets. */ gen logy=ln(dep) /*To get geometric mean suggested in the footnote.*/ egen c1=mean(logy) /*Contiue on geometric mean*/ gen c=exp(c1) display c /* showing geometric mean of the dependent variable*/ 8.2501268 gen cv=dep*(ln(dep/c)-1) /*generate constructed variable*/ xi: regress dep asset1 i.nation i.sector cv i.nation Inatio_1-4 (Inatio_1 for nation==CAN omitted) i.sector Isect_1-10 (Isect_1 for sector==AGR omitted) Source | SS df MS Number of obs = 248 ---------+------------------------------ F( 14, 233) = 103.67 Model | 55048.7339 14 3932.05242 Prob > F = 0.0000 Residual | 8837.65321 233 37.9298421 R-squared = 0.8617 ---------+------------------------------ Adj R-squared = 0.8534 Total | 63886.3871 247 258.64934 Root MSE = 6.1587 ------------------------------------------------------------------------------ dep | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- asset1 | .0699101 .0152576 4.582 0.000 .0398497 .0999705 Inatio_2 | -.1373967 1.690291 -0.081 0.935 -3.467603 3.19281.....(continued) cv | .5850363 .0313226 18.678 0.000 .5233246 .646748 _cons | 11.38911 1.132516 10.056 0.000 9.15783 13.62039 ------------------------------------------------------------------------------ test cv=0 ( 1) cv = 0.0 F( 1, 233) = 348.86 Prob > F = 0.0000 display 1-_coef[cv] .41496366
A couple of new programs have been developed for Stata that deal with Box-Cox and Box-Tidwell transform. We will show how to use them here. The first procedure is boxcox, which is available in Stata 7. This procedure finds the maximum likelihood estimate of the parameter(s) of Box-Cox transform.
xi: boxcox dep asset1 i.nation i.sector, model(lhsonly) i.nation _Ination_1-4 (_Ination_1 for nation==CAN omitted) i.sector _Isector_1-10 (_Isector_1 for sector==AGR omitted) Estimating comparison model Iteration 0: log likelihood = -1040.2744 Iteration 1: log likelihood = -917.82824 Iteration 2: log likelihood = -909.0709 Iteration 3: log likelihood = -909.05843 Iteration 4: log likelihood = -909.05843 Estimating full model Iteration 0: log likelihood = -908.4755 Iteration 1: log likelihood = -820.15739 Iteration 2: log likelihood = -819.63494 Iteration 3: log likelihood = -819.6349 Number of obs = 248 LR chi2(13) = 178.85 Log likelihood = -819.6349 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /theta | .3063002 .0528239 5.80 0.000 .2027673 .4098331 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | asset1 | .0299857 _Ination_2 | -.1459837 _Ination_3 | -.6117602 _Ination_4 | -1.407129 _Isector_2 | -2.923013 _Isector_3 | -.9114291 _Isector_4 | .1482129 _Isector_5 | -.2716637 _Isector_6 | .0036382 _Isector_7 | .3405276 _Isector_8 | .5982639 _Isector_9 | .6550884 _Isector_10 | 1.048606 _cons | 2.115848 -------------+-------------- /sigma | 1.525277 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-Value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- theta = -1 -1063.8257 488.38 0.000 theta = 0 -835.82415 32.38 0.000 theta = 1 -908.4755 177.68 0.000 ---------------------------------------------------------
The Box-Tidwell transformation on page 325 using data file prestige.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/prestige, clear gen w2=percwomn*percwomn gen linc =income*ln(income) gen ledu =educat*ln(educat) regress prestige percwomn w2 educat income (Output omitted.) gen b1=_coef[educat] gen b2=_coef[income] regress prestige percwomn w2 educat income linc ledu (Output omitted.) gen d1=_coef[ledu] gen d2=_coef[linc] display 1+d1/b1 /*frist-step approximation on educat*/ 2.2435437 display 1+d2/b2 /*first-step approximation on income*/ -.91030351
We now use the procedure boxtid to get the fully iterated MLEs of the transformation parameters for educat and income. The boxtid command can be downloaded within Stata by typing search boxtid (see How can I use the search command to search for programs and get additional help? for more information about using search), as shown below.
boxtid regress prestige percwomn w2 educat income, df(percwomn w2:1) Iteration 0: Deviance = 675.2403 Iteration 1: Deviance = 674.8334 (change = -.4068906) Iteration 2: Deviance = 674.8169 (change = -.0164992) Iteration 3: Deviance = 674.8155 (change = -.0014157) Iteration 4: Deviance = 674.8154 (change = -.0001051) -> gen double Ieduc__1 = educat^2.1939-182.7 if e(sample) -> gen double Ieduc__2 = educat^2.1939*ln(educat)-433.7 if e(sample) -> gen double Iinco__1 = X^-0.0375-1.015 if e(sample) -> gen double Iinco__2 = X^-0.0375*ln(X)+.3916 if e(sample) (where: X = income/10000) -> gen double Iperc__1 = percwomn-28.98 if e(sample) -> gen double Iw2__1 = w2-1836 if e(sample) [Total iterations: 8] Box-Tidwell regression model Source | SS df MS Number of obs = 102 ---------+------------------------------ F( 6, 95) = 90.29 Model | 25435.2987 6 4239.21644 Prob > F = 0.0000 Residual | 4460.12743 95 46.9487098 R-squared = 0.8508 ---------+------------------------------ Adj R-squared = 0.8414 Total | 29895.4261 101 295.994318 Root MSE = 6.8519 ------------------------------------------------------------------------------ prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Ieduc__1 | .0984751 .1767093 0.557 0.579 -.2523372 .4492873 Ieduc_p1 | -.0001244 .060728 -0.002 0.998 -.1206846 .1204359 Iinco__1 | -333.2435 1640.086 -0.203 0.839 -3589.226 2922.739 Iinco_p1 | .1065035 59.25056 0.002 0.999 -117.5207 117.7337 Iperc__1 | -.1658225 .0903862 -1.835 0.070 -.3452618 .0136168 Iw2__1 | .0024899 .000972 2.562 0.012 .0005603 .0044195 _cons | 47.51562 1.063237 44.690 0.000 45.40483 49.62641 ------------------------------------------------------------------------------ educat | 3.806089 .3444647 11.049 Nonlin. dev. 5.019 (P = 0.031) p1 | 2.193927 .6218453 3.528 ------------------------------------------------------------------------------ income | .0011613 .00028 4.147 Nonlin. dev. 26.178 (P = 0.000) p1 | -.0375134 .1638984 -0.229 ------------------------------------------------------------------------------ Deviance: 674.815.
Figure 12.10. (a) and (b) on page 326 using the data file prestige. The constructed-variable plot in this case is the added variable plot for the constructed variable X*log(X).
regress prestige percwomn w2 educat income (Output omitted) avplot linc, ylabel(-40(20)20) xlabel(-4000(4000)8000)
avplot ledu, ylabel(-40(20)20) xlabel(-.8(.4)1.2)
use https://stats.idre.ucla.edu/stat/stata/examples/ara/ornstein, clear gen asset1=sqrt(assets) gen y=intrlcks+1 xi: regress y asset1 i.nation i.sector /*generate residuals*/ (Output omitted) predict res, r predict fitted, xb gen res2=res*res egen m= mean(res2) gen u=res2/m /*generate U_i**/ regress u fitted Source | SS df MS Number of obs = 248 ---------+------------------------------ F( 1, 246) = 45.73 Model | 147.64882 1 147.64882 Prob > F = 0.0000 Residual | 794.296284 246 3.22884668 R-squared = 0.1567 ---------+------------------------------ Adj R-squared = 0.1533 Total | 941.945104 247 3.81354293 Root MSE = 1.7969 ------------------------------------------------------------------------------ u | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- fitted | .0594211 .0087872 6.762 0.000 .0421134 .0767288 _cons | .1336017 .1715663 0.779 0.437 -.2043246 .4715279 ------------------------------------------------------------------------------ display 147.6488/2 73.8244 egen ybar=mean(fitted) display 1-0.5*0.0594*ybar .56695483 xi: regress u asset1 i.nation i.sector i.nation Inatio_1-4 (Inatio_1 for nation==CAN omitted) i.sector Isect_1-10 (Isect_1 for sector==AGR omitted) Source | SS df MS Number of obs = 248 ---------+------------------------------ F( 13, 234) = 4.04 Model | 172.590223 13 13.276171 Prob > F = 0.0000 Residual | 769.35488 234 3.28784137 R-squared = 0.1832 ---------+------------------------------ Adj R-squared = 0.1379 Total | 941.945104 247 3.81354293 Root MSE = 1.8132 (More output omitted)