The data files used for the examples in this text can be downloaded in a zip file from the Stata Web site. You can then use a program such as zip to unzip the data files.
Example 15.1 on page 455 using mroz.dta. Notice that the robust standard error in the output below is slightly different from the book. This is because Stata does the finite sample correction to the robust standard error.
use mroz, clear
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Source | SS df MS Number of obs = 753
-------------+------------------------------ F( 7, 745) = 38.22
Model | 48.8080578 7 6.97257969 Prob > F = 0.0000
Residual | 135.919698 745 .182442547 R-squared = 0.2642
-------------+------------------------------ Adj R-squared = 0.2573
Total | 184.727756 752 .245648611 Root MSE = .42713
------------------------------------------------------------------------------
inlf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0034052 .0014485 -2.35 0.019 -.0062488 -.0005616
educ | .0379953 .007376 5.15 0.000 .023515 .0524756
exper | .0394924 .0056727 6.96 0.000 .0283561 .0506287
expersq | -.0005963 .0001848 -3.23 0.001 -.0009591 -.0002335
age | -.0160908 .0024847 -6.48 0.000 -.0209686 -.011213
kidslt6 | -.2618105 .0335058 -7.81 0.000 -.3275875 -.1960335
kidsge6 | .0130122 .013196 0.99 0.324 -.0128935 .0389179
_cons | .5855192 .154178 3.80 0.000 .2828442 .8881943
------------------------------------------------------------------------------
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6, robust
Regression with robust standard errors Number of obs = 753
F( 7, 745) = 62.48
Prob > F = 0.0000
R-squared = 0.2642
Root MSE = .42713
------------------------------------------------------------------------------
| Robust
inlf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0034052 .0015249 -2.23 0.026 -.0063988 -.0004115
educ | .0379953 .007266 5.23 0.000 .023731 .0522596
exper | .0394924 .00581 6.80 0.000 .0280864 .0508983
expersq | -.0005963 .00019 -3.14 0.002 -.0009693 -.0002233
age | -.0160908 .002399 -6.71 0.000 -.0208004 -.0113812
kidslt6 | -.2618105 .0317832 -8.24 0.000 -.3242058 -.1994152
kidsge6 | .0130122 .0135329 0.96 0.337 -.013555 .0395795
_cons | .5855192 .1522599 3.85 0.000 .2866098 .8844287
------------------------------------------------------------------------------
Example 15.2 on page 468 using mroz.dta.
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Source | SS df MS Number of obs = 753
-------------+------------------------------ F( 7, 745) = 38.22
Model | 48.8080578 7 6.97257969 Prob > F = 0.0000
Residual | 135.919698 745 .182442547 R-squared = 0.2642
-------------+------------------------------ Adj R-squared = 0.2573
Total | 184.727756 752 .245648611 Root MSE = .42713
------------------------------------------------------------------------------
inlf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0034052 .0014485 -2.35 0.019 -.0062488 -.0005616
educ | .0379953 .007376 5.15 0.000 .023515 .0524756
exper | .0394924 .0056727 6.96 0.000 .0283561 .0506287
expersq | -.0005963 .0001848 -3.23 0.001 -.0009591 -.0002335
age | -.0160908 .0024847 -6.48 0.000 -.0209686 -.011213
kidslt6 | -.2618105 .0335058 -7.81 0.000 -.3275875 -.1960335
kidsge6 | .0130122 .013196 0.99 0.324 -.0128935 .0389179
_cons | .5855192 .154178 3.80 0.000 .2828442 .8881943
------------------------------------------------------------------------------
gen lpm_c = (lpm>=.5)
tab inlf lpm_c
=1 if in |
lab frce, | lpm_c
1975 | 0 1 | Total
-----------+----------------------+----------
0 | 203 122 | 325
1 | 78 350 | 428
-----------+----------------------+----------
Total | 281 472 | 753
di (203+350)/753
.73439575
logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Logit estimates Number of obs = 753
LR chi2(7) = 226.22
Prob > chi2 = 0.0000
Log likelihood = -401.76515 Pseudo R2 = 0.2197
------------------------------------------------------------------------------
inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0213452 .0084214 -2.53 0.011 -.0378509 -.0048394
educ | .2211704 .0434396 5.09 0.000 .1360303 .3063105
exper | .2058695 .0320569 6.42 0.000 .1430391 .2686999
expersq | -.0031541 .0010161 -3.10 0.002 -.0051456 -.0011626
age | -.0880244 .014573 -6.04 0.000 -.116587 -.0594618
kidslt6 | -1.443354 .2035849 -7.09 0.000 -1.842373 -1.044335
kidsge6 | .0601122 .0747897 0.80 0.422 -.086473 .2066974
_cons | .4254524 .8603696 0.49 0.621 -1.260841 2.111746
------------------------------------------------------------------------------
* Stata 8 code.
lstat
* Stata 9 code and output.
estat classification
Logistic model for inlf
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 347 118 | 465
- | 81 207 | 288
-----------+--------------------------+-----------
Total | 428 325 | 753
Classified + if predicted Pr(D) >= .5
True D defined as inlf != 0
--------------------------------------------------
Sensitivity Pr( +| D) 81.07%
Specificity Pr( -|~D) 63.69%
Positive predictive value Pr( D| +) 74.62%
Negative predictive value Pr(~D| -) 71.88%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 36.31%
False - rate for true D Pr( -| D) 18.93%
False + rate for classified + Pr(~D| +) 25.38%
False - rate for classified - Pr( D| -) 28.13%
--------------------------------------------------
Correctly classified 73.57%
--------------------------------------------------
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Probit estimates Number of obs = 753
LR chi2(7) = 227.14
Prob > chi2 = 0.0000
Log likelihood = -401.30219 Pseudo R2 = 0.2206
------------------------------------------------------------------------------
inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0120237 .0048398 -2.48 0.013 -.0215096 -.0025378
educ | .1309047 .0252542 5.18 0.000 .0814074 .180402
exper | .1233476 .0187164 6.59 0.000 .0866641 .1600311
expersq | -.0018871 .0006 -3.15 0.002 -.003063 -.0007111
age | -.0528527 .0084772 -6.23 0.000 -.0694678 -.0362376
kidslt6 | -.8683285 .1185223 -7.33 0.000 -1.100628 -.636029
kidsge6 | .036005 .0434768 0.83 0.408 -.049208 .1212179
_cons | .2700768 .508593 0.53 0.595 -.7267473 1.266901
------------------------------------------------------------------------------
* Stata 8 code.
lstat
* Stata 9 code and output.
estat classification
Probit model for inlf
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 348 120 | 468
- | 80 205 | 285
-----------+--------------------------+-----------
Total | 428 325 | 753
Classified + if predicted Pr(D) >= .5
True D defined as inlf != 0
--------------------------------------------------
Sensitivity Pr( +| D) 81.31%
Specificity Pr( -|~D) 63.08%
Positive predictive value Pr( D| +) 74.36%
Negative predictive value Pr(~D| -) 71.93%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 36.92%
False - rate for true D Pr( -| D) 18.69%
False + rate for classified + Pr(~D| +) 25.64%
False - rate for classified - Pr( D| -) 28.07%
--------------------------------------------------
Correctly classified 73.44%
--------------------------------------------------
Example 15.3 on page 474, testing for exogeneity of education.
reg educ nwifeinc exper expersq age kidslt6 kidsge6 motheduc fatheduc huseduc
Source | SS df MS Number of obs = 753
-------------+------------------------------ F( 9, 743) = 74.07
Model | 1849.07781 9 205.45309 Prob > F = 0.0000
Residual | 2060.96203 743 2.77383853 R-squared = 0.4729
-------------+------------------------------ Adj R-squared = 0.4665
Total | 3910.03984 752 5.19952106 Root MSE = 1.6655
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | .0156893 .0058267 2.69 0.007 .0042506 .027128
exper | .0577544 .0220604 2.62 0.009 .0144462 .1010625
expersq | -.000784 .000721 -1.09 0.277 -.0021994 .0006314
age | -.0059011 .0098709 -0.60 0.550 -.0252792 .013477
kidslt6 | .1195954 .1307071 0.91 0.360 -.1370038 .3761945
kidsge6 | -.0731404 .0515299 -1.42 0.156 -.174302 .0280212
motheduc | .1300347 .0225669 5.76 0.000 .0857322 .1743373
fatheduc | .0950702 .0214618 4.43 0.000 .0529373 .1372032
huseduc | .3475092 .0235063 14.78 0.000 .3013626 .3936558
_cons | 5.43695 .5873755 9.26 0.000 4.283837 6.590064
------------------------------------------------------------------------------
predict u, res
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 u
Probit estimates Number of obs = 753
LR chi2(8) = 227.90
Prob > chi2 = 0.0000
Log likelihood = -400.92551 Pseudo R2 = 0.2213
------------------------------------------------------------------------------
inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0102851 .0052347 -1.96 0.049 -.020545 -.0000253
educ | .1035752 .0403061 2.57 0.010 .0245767 .1825737
exper | .1262477 .0190256 6.64 0.000 .0889582 .1635373
expersq | -.0019432 .0006032 -3.22 0.001 -.0031254 -.0007609
age | -.0543808 .0086633 -6.28 0.000 -.0713605 -.0374012
kidslt6 | -.8630859 .1187394 -7.27 0.000 -1.095811 -.630361
kidsge6 | .0313802 .0437901 0.72 0.474 -.0544468 .1172071
u | .0433658 .050021 0.87 0.386 -.0546736 .1414051
_cons | .6209105 .6497413 0.96 0.339 -.652559 1.89438
------------------------------------------------------------------------------
Example 15.4 on page 498 using keane.dta.
use keane, clear
* Stata 8 code.
mlogit status educ exper expersq black if year ==87, basecategory(1)
* Stata 9 code and output.
mlogit status educ exper expersq black if year ==87, baseoutcome(1)
Multinomial logistic regression Number of obs = 1717
LR chi2(8) = 583.72
Prob > chi2 = 0.0000
Log likelihood = -907.85723 Pseudo R2 = 0.2433
------------------------------------------------------------------------------
status | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
2 |
educ | -.6736313 .0698999 -9.64 0.000 -.8106325 -.53663
exper | -.1062149 .173282 -0.61 0.540 -.4458414 .2334116
expersq | -.0125152 .0252291 -0.50 0.620 -.0619633 .036933
black | .8130166 .3027231 2.69 0.007 .2196902 1.406343
_cons | 10.27787 1.133336 9.07 0.000 8.056578 12.49917
-------------+----------------------------------------------------------------
3 |
educ | -.3146573 .0651096 -4.83 0.000 -.4422699 -.1870448
exper | .8487367 .1569856 5.41 0.000 .5410507 1.156423
expersq | -.0773003 .0229217 -3.37 0.001 -.1222261 -.0323746
black | .3113612 .2815339 1.11 0.269 -.240435 .8631574
_cons | 5.543798 1.086409 5.10 0.000 3.414475 7.673121
------------------------------------------------------------------------------
(Outcome status==1 is the comparison group)
predict p1 p2 p3 if e(sample), p
(11006 missing values generated)
gen pstatus = .
(12723 missing values generated)
egen atest = rmax(p1 p2 p3) if e(sample)
(11006 missing values generated)
foreach n of numlist 1/3 {
replace pstatus = `n' if p`n'==atest & year ==87
}
(36 real changes made)
(214 real changes made)
(1530 real changes made)
gen pdiff = (status == pstatus)
tab pdiff if year==87 & exper~=. & black ~=. & status ~=.
pdiff | Freq. Percent Cum.
------------+-----------------------------------
0 | 351 20.44 20.44
1 | 1,366 79.56 100.00
------------+-----------------------------------
Total | 1,717 100.00
di 1366/1717
.79557368
test [2]: exper expersq
( 1) [2]exper = 0
( 2) [2]expersq = 0
chi2( 2) = 6.12
Prob > chi2 = 0.0468
Example 15.5 on page 507 using pension.dta.
use pension, clear
reg pctstck choice age educ female black married finc25 finc35 finc50 finc75 finc100 ///
finc101 wealth89 prftshr
Source | SS df MS Number of obs = 194
-------------+------------------------------ F( 14, 179) = 1.42
Model | 30402.0516 14 2171.57511 Prob > F = 0.1486
Residual | 274134.031 179 1531.47503 R-squared = 0.0998
-------------+------------------------------ Adj R-squared = 0.0294
Total | 304536.082 193 1577.90716 Root MSE = 39.134
------------------------------------------------------------------------------
pctstck | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
choice | 12.04773 6.298171 1.91 0.057 -.3804881 24.47594
age | -1.625967 .7748246 -2.10 0.037 -3.154932 -.0970012
educ | .7538685 1.207392 0.62 0.533 -1.628684 3.136421
female | 1.302856 7.163775 0.18 0.856 -12.83346 15.43917
black | 3.967391 9.782799 0.41 0.686 -15.33706 23.27184
married | 3.303436 7.997618 0.41 0.680 -12.47831 19.08518
finc25 | -18.18567 14.12026 -1.29 0.199 -46.04924 9.677906
finc35 | -3.925374 14.48565 -0.27 0.787 -32.50999 24.65924
finc50 | -8.128784 14.34191 -0.57 0.572 -36.42976 20.17219
finc75 | -17.57921 16.07766 -1.09 0.276 -49.30534 14.14693
finc100 | -6.74559 15.79116 -0.43 0.670 -37.90637 24.41519
finc101 | -28.34407 17.9049 -1.58 0.115 -63.67591 6.987774
wealth89 | -.0026918 .0124603 -0.22 0.829 -.0272797 .0218961
prftshr | 15.80791 7.332677 2.16 0.032 1.338299 30.27752
_cons | 134.1161 55.70525 2.41 0.017 24.1926 244.0395
------------------------------------------------------------------------------
oprobit pctstck choice age educ female black married finc25 finc35 finc50 finc75 finc100 finc101 ///
wealth89 prftshr
Ordered probit estimates Number of obs = 194
LR chi2(14) = 20.77
Prob > chi2 = 0.1077
Log likelihood = -201.9865 Pseudo R2 = 0.0489
------------------------------------------------------------------------------
pctstck | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
choice | .371171 .1841121 2.02 0.044 .010318 .7320241
age | -.0500516 .0226063 -2.21 0.027 -.0943591 -.005744
educ | .0261382 .0352561 0.74 0.458 -.0429626 .0952389
female | .0455642 .206004 0.22 0.825 -.3581963 .4493246
black | .0933923 .2820403 0.33 0.741 -.4593965 .6461811
married | .0935981 .2332114 0.40 0.688 -.3634878 .550684
finc25 | -.5784299 .423162 -1.37 0.172 -1.407812 .2509524
finc35 | -.1346721 .4305242 -0.31 0.754 -.9784841 .7091399
finc50 | -.2620401 .4265936 -0.61 0.539 -1.098148 .5740681
finc75 | -.5662312 .4780035 -1.18 0.236 -1.503101 .3706385
finc100 | -.2278963 .4685942 -0.49 0.627 -1.146324 .6905316
finc101 | -.8641109 .5291111 -1.63 0.102 -1.90115 .1729279
wealth89 | -.0000956 .0003737 -0.26 0.798 -.0008279 .0006368
prftshr | .4817182 .2161233 2.23 0.026 .0581243 .905312
-------------+----------------------------------------------------------------
_cut1 | -3.087373 1.623765 (Ancillary parameters)
_cut2 | -2.053553 1.618611
------------------------------------------------------------------------------
gen pclass=.
(226 missing values generated)
predict c1 if e(sample), outcome(0)
(option p assumed; predicted probability)
(32 missing values generated)
predict c2 if e(sample), outcome(50)
(option p assumed; predicted probability)
(32 missing values generated)
predict c3 if e(sample), outcome(100)
(option p assumed; predicted probability)
(32 missing values generated)
egen atest = rmax(c1 c2 c3)
(32 missing values generated)
foreach n of numlist 1/3 {
replace pclass = `n' if c`n' ==atest
}
(97 real changes made)
(113 real changes made)
(80 real changes made)
tab pclass pctstck if e(sample)
| 0=mstbnds,50=mixed,100=mststcks
pclass | 0 50 100 | Total
-----------+---------------------------------+----------
1 | 33 21 11 | 65
2 | 25 31 25 | 81
3 | 6 20 22 | 48
-----------+---------------------------------+----------
Total | 64 72 58 | 194
di (33+31+22)/194
.44329897
di 33/64
.515625
di 31/72
.43055556
di 22/58
.37931034
