Econometric Analysis of Cross Section and Panel Data by Jeffrey M. Wooldridge Chapter 15: Discrete Response Models

The data files used for the examples in this text can be downloaded in a zip file from the Stata Web site. You can then use a program such as zip to unzip the data files.

Example 15.1 on page 455 using mroz.dta. Notice that the robust standard error in the output below is slightly different from the book. This is because Stata does the finite sample correction to the robust standard error.

use mroz, clear

reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

      Source |       SS       df       MS              Number of obs =     753
-------------+------------------------------           F(  7,   745) =   38.22
       Model |  48.8080578     7  6.97257969           Prob > F      =  0.0000
    Residual |  135.919698   745  .182442547           R-squared     =  0.2642
-------------+------------------------------           Adj R-squared =  0.2573
       Total |  184.727756   752  .245648611           Root MSE      =  .42713

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0034052   .0014485    -2.35   0.019    -.0062488   -.0005616
        educ |   .0379953    .007376     5.15   0.000      .023515    .0524756
       exper |   .0394924   .0056727     6.96   0.000     .0283561    .0506287
     expersq |  -.0005963   .0001848    -3.23   0.001    -.0009591   -.0002335
         age |  -.0160908   .0024847    -6.48   0.000    -.0209686    -.011213
     kidslt6 |  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
     kidsge6 |   .0130122    .013196     0.99   0.324    -.0128935    .0389179
       _cons |   .5855192    .154178     3.80   0.000     .2828442    .8881943
------------------------------------------------------------------------------

reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6, robust

Regression with robust standard errors                 Number of obs =     753
                                                       F(  7,   745) =   62.48
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2642
                                                       Root MSE      =  .42713

------------------------------------------------------------------------------
             |               Robust
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0034052   .0015249    -2.23   0.026    -.0063988   -.0004115
        educ |   .0379953    .007266     5.23   0.000      .023731    .0522596
       exper |   .0394924     .00581     6.80   0.000     .0280864    .0508983
     expersq |  -.0005963     .00019    -3.14   0.002    -.0009693   -.0002233
         age |  -.0160908    .002399    -6.71   0.000    -.0208004   -.0113812
     kidslt6 |  -.2618105   .0317832    -8.24   0.000    -.3242058   -.1994152
     kidsge6 |   .0130122   .0135329     0.96   0.337     -.013555    .0395795
       _cons |   .5855192   .1522599     3.85   0.000     .2866098    .8844287
------------------------------------------------------------------------------

Example 15.2 on page 468 using mroz.dta.

reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

      Source |       SS       df       MS              Number of obs =     753
-------------+------------------------------           F(  7,   745) =   38.22
       Model |  48.8080578     7  6.97257969           Prob > F      =  0.0000
    Residual |  135.919698   745  .182442547           R-squared     =  0.2642
-------------+------------------------------           Adj R-squared =  0.2573
       Total |  184.727756   752  .245648611           Root MSE      =  .42713

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0034052   .0014485    -2.35   0.019    -.0062488   -.0005616
        educ |   .0379953    .007376     5.15   0.000      .023515    .0524756
       exper |   .0394924   .0056727     6.96   0.000     .0283561    .0506287
     expersq |  -.0005963   .0001848    -3.23   0.001    -.0009591   -.0002335
         age |  -.0160908   .0024847    -6.48   0.000    -.0209686    -.011213
     kidslt6 |  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
     kidsge6 |   .0130122    .013196     0.99   0.324    -.0128935    .0389179
       _cons |   .5855192    .154178     3.80   0.000     .2828442    .8881943
------------------------------------------------------------------------------

gen lpm_c = (lpm>=.5)
tab inlf lpm_c

  =1 if in |
 lab frce, |         lpm_c
      1975 |         0          1 |     Total
-----------+----------------------+----------
         0 |       203        122 |       325 
         1 |        78        350 |       428 
-----------+----------------------+----------
     Total |       281        472 |       753 

di (203+350)/753
.73439575

logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

Logit estimates                                   Number of obs   =        753
                                                  LR chi2(7)      =     226.22
                                                  Prob > chi2     =     0.0000
Log likelihood = -401.76515                       Pseudo R2       =     0.2197
------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0213452   .0084214    -2.53   0.011    -.0378509   -.0048394
        educ |   .2211704   .0434396     5.09   0.000     .1360303    .3063105
       exper |   .2058695   .0320569     6.42   0.000     .1430391    .2686999
     expersq |  -.0031541   .0010161    -3.10   0.002    -.0051456   -.0011626
         age |  -.0880244    .014573    -6.04   0.000     -.116587   -.0594618
     kidslt6 |  -1.443354   .2035849    -7.09   0.000    -1.842373   -1.044335
     kidsge6 |   .0601122   .0747897     0.80   0.422     -.086473    .2066974
       _cons |   .4254524   .8603696     0.49   0.621    -1.260841    2.111746
------------------------------------------------------------------------------

* Stata 8 code.
lstat

* Stata 9 code and output.
estat classification

Logistic model for inlf

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |       347           118  |        465
     -     |        81           207  |        288
-----------+--------------------------+-----------
   Total   |       428           325  |        753

Classified + if predicted Pr(D) >= .5
True D defined as inlf != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)   81.07%
Specificity                     Pr( -|~D)   63.69%
Positive predictive value       Pr( D| +)   74.62%
Negative predictive value       Pr(~D| -)   71.88%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)   36.31%
False - rate for true D         Pr( -| D)   18.93%
False + rate for classified +   Pr(~D| +)   25.38%
False - rate for classified -   Pr( D| -)   28.13%
--------------------------------------------------
Correctly classified                        73.57%
--------------------------------------------------

probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

Probit estimates                                  Number of obs   =        753
                                                  LR chi2(7)      =     227.14
                                                  Prob > chi2     =     0.0000
Log likelihood = -401.30219                       Pseudo R2       =     0.2206
------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0120237   .0048398    -2.48   0.013    -.0215096   -.0025378
        educ |   .1309047   .0252542     5.18   0.000     .0814074     .180402
       exper |   .1233476   .0187164     6.59   0.000     .0866641    .1600311
     expersq |  -.0018871      .0006    -3.15   0.002     -.003063   -.0007111
         age |  -.0528527   .0084772    -6.23   0.000    -.0694678   -.0362376
     kidslt6 |  -.8683285   .1185223    -7.33   0.000    -1.100628    -.636029
     kidsge6 |    .036005   .0434768     0.83   0.408     -.049208    .1212179
       _cons |   .2700768    .508593     0.53   0.595    -.7267473    1.266901
------------------------------------------------------------------------------

* Stata 8 code.
lstat

* Stata 9 code and output.
estat classification

Probit model for inlf

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |       348           120  |        468
     -     |        80           205  |        285
-----------+--------------------------+-----------
   Total   |       428           325  |        753

Classified + if predicted Pr(D) >= .5
True D defined as inlf != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)   81.31%
Specificity                     Pr( -|~D)   63.08%
Positive predictive value       Pr( D| +)   74.36%
Negative predictive value       Pr(~D| -)   71.93%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)   36.92%
False - rate for true D         Pr( -| D)   18.69%
False + rate for classified +   Pr(~D| +)   25.64%
False - rate for classified -   Pr( D| -)   28.07%
--------------------------------------------------
Correctly classified                        73.44%
--------------------------------------------------

Example 15.3 on page 474, testing for exogeneity of education.

reg educ nwifeinc exper expersq age kidslt6 kidsge6 motheduc fatheduc huseduc

      Source |       SS       df       MS              Number of obs =     753
-------------+------------------------------           F(  9,   743) =   74.07
       Model |  1849.07781     9   205.45309           Prob > F      =  0.0000
    Residual |  2060.96203   743  2.77383853           R-squared     =  0.4729
-------------+------------------------------           Adj R-squared =  0.4665
       Total |  3910.03984   752  5.19952106           Root MSE      =  1.6655

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |   .0156893   .0058267     2.69   0.007     .0042506     .027128
       exper |   .0577544   .0220604     2.62   0.009     .0144462    .1010625
     expersq |   -.000784    .000721    -1.09   0.277    -.0021994    .0006314
         age |  -.0059011   .0098709    -0.60   0.550    -.0252792     .013477
     kidslt6 |   .1195954   .1307071     0.91   0.360    -.1370038    .3761945
     kidsge6 |  -.0731404   .0515299    -1.42   0.156     -.174302    .0280212
    motheduc |   .1300347   .0225669     5.76   0.000     .0857322    .1743373
    fatheduc |   .0950702   .0214618     4.43   0.000     .0529373    .1372032
     huseduc |   .3475092   .0235063    14.78   0.000     .3013626    .3936558
       _cons |    5.43695   .5873755     9.26   0.000     4.283837    6.590064
------------------------------------------------------------------------------

predict u, res
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 u

Probit estimates                                  Number of obs   =        753
                                                  LR chi2(8)      =     227.90
                                                  Prob > chi2     =     0.0000
Log likelihood = -400.92551                       Pseudo R2       =     0.2213

------------------------------------------------------------------------------
        inlf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0102851   .0052347    -1.96   0.049     -.020545   -.0000253
        educ |   .1035752   .0403061     2.57   0.010     .0245767    .1825737
       exper |   .1262477   .0190256     6.64   0.000     .0889582    .1635373
     expersq |  -.0019432   .0006032    -3.22   0.001    -.0031254   -.0007609
         age |  -.0543808   .0086633    -6.28   0.000    -.0713605   -.0374012
     kidslt6 |  -.8630859   .1187394    -7.27   0.000    -1.095811    -.630361
     kidsge6 |   .0313802   .0437901     0.72   0.474    -.0544468    .1172071
           u |   .0433658    .050021     0.87   0.386    -.0546736    .1414051
       _cons |   .6209105   .6497413     0.96   0.339     -.652559     1.89438
------------------------------------------------------------------------------

Example 15.4 on page 498 using keane.dta.

use keane, clear

* Stata 8 code.
mlogit status educ exper expersq black if year ==87, basecategory(1)

* Stata 9 code and output.
mlogit status educ exper expersq black if year ==87, baseoutcome(1)

Multinomial logistic regression                   Number of obs   =       1717
                                                  LR chi2(8)      =     583.72
                                                  Prob > chi2     =     0.0000
Log likelihood = -907.85723                       Pseudo R2       =     0.2433

------------------------------------------------------------------------------
      status |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
2            |
        educ |  -.6736313   .0698999    -9.64   0.000    -.8106325     -.53663
       exper |  -.1062149    .173282    -0.61   0.540    -.4458414    .2334116
     expersq |  -.0125152   .0252291    -0.50   0.620    -.0619633     .036933
       black |   .8130166   .3027231     2.69   0.007     .2196902    1.406343
       _cons |   10.27787   1.133336     9.07   0.000     8.056578    12.49917
-------------+----------------------------------------------------------------
3            |
        educ |  -.3146573   .0651096    -4.83   0.000    -.4422699   -.1870448
       exper |   .8487367   .1569856     5.41   0.000     .5410507    1.156423
     expersq |  -.0773003   .0229217    -3.37   0.001    -.1222261   -.0323746
       black |   .3113612   .2815339     1.11   0.269     -.240435    .8631574
       _cons |   5.543798   1.086409     5.10   0.000     3.414475    7.673121
------------------------------------------------------------------------------
(Outcome status==1 is the comparison group)

predict p1 p2 p3 if e(sample), p
(11006 missing values generated)

gen pstatus = .
(12723 missing values generated)

egen atest = rmax(p1 p2 p3)  if e(sample)
(11006 missing values generated)

foreach n of numlist 1/3 {
	replace pstatus = `n' if p`n'==atest & year ==87
		}
(36 real changes made)
(214 real changes made)
(1530 real changes made)

gen pdiff = (status == pstatus)
tab pdiff if year==87 & exper~=. & black ~=. & status ~=.

      pdiff |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        351       20.44       20.44
          1 |      1,366       79.56      100.00
------------+-----------------------------------
      Total |      1,717      100.00

di 1366/1717
.79557368

test [2]: exper expersq

 ( 1)  [2]exper = 0
 ( 2)  [2]expersq = 0

           chi2(  2) =    6.12
         Prob > chi2 =    0.0468

Example 15.5 on page 507 using pension.dta.

use pension, clear

reg pctstck choice age educ female black married finc25 finc35 finc50 finc75 finc100 ///
	finc101 wealth89 prftshr
	
      Source |       SS       df       MS              Number of obs =     194
-------------+------------------------------           F( 14,   179) =    1.42
       Model |  30402.0516    14  2171.57511           Prob > F      =  0.1486
    Residual |  274134.031   179  1531.47503           R-squared     =  0.0998
-------------+------------------------------           Adj R-squared =  0.0294
       Total |  304536.082   193  1577.90716           Root MSE      =  39.134

------------------------------------------------------------------------------
     pctstck |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      choice |   12.04773   6.298171     1.91   0.057    -.3804881    24.47594
         age |  -1.625967   .7748246    -2.10   0.037    -3.154932   -.0970012
        educ |   .7538685   1.207392     0.62   0.533    -1.628684    3.136421
      female |   1.302856   7.163775     0.18   0.856    -12.83346    15.43917
       black |   3.967391   9.782799     0.41   0.686    -15.33706    23.27184
     married |   3.303436   7.997618     0.41   0.680    -12.47831    19.08518
      finc25 |  -18.18567   14.12026    -1.29   0.199    -46.04924    9.677906
      finc35 |  -3.925374   14.48565    -0.27   0.787    -32.50999    24.65924
      finc50 |  -8.128784   14.34191    -0.57   0.572    -36.42976    20.17219
      finc75 |  -17.57921   16.07766    -1.09   0.276    -49.30534    14.14693
     finc100 |   -6.74559   15.79116    -0.43   0.670    -37.90637    24.41519
     finc101 |  -28.34407    17.9049    -1.58   0.115    -63.67591    6.987774
    wealth89 |  -.0026918   .0124603    -0.22   0.829    -.0272797    .0218961
     prftshr |   15.80791   7.332677     2.16   0.032     1.338299    30.27752
       _cons |   134.1161   55.70525     2.41   0.017      24.1926    244.0395
------------------------------------------------------------------------------

oprobit pctstck choice age educ female black married finc25 finc35 finc50 finc75 finc100 finc101 ///
	wealth89 prftshr 

Ordered probit estimates                          Number of obs   =        194
                                                  LR chi2(14)     =      20.77
                                                  Prob > chi2     =     0.1077
Log likelihood =  -201.9865                       Pseudo R2       =     0.0489

------------------------------------------------------------------------------
     pctstck |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      choice |    .371171   .1841121     2.02   0.044      .010318    .7320241
         age |  -.0500516   .0226063    -2.21   0.027    -.0943591    -.005744
        educ |   .0261382   .0352561     0.74   0.458    -.0429626    .0952389
      female |   .0455642    .206004     0.22   0.825    -.3581963    .4493246
       black |   .0933923   .2820403     0.33   0.741    -.4593965    .6461811
     married |   .0935981   .2332114     0.40   0.688    -.3634878     .550684
      finc25 |  -.5784299    .423162    -1.37   0.172    -1.407812    .2509524
      finc35 |  -.1346721   .4305242    -0.31   0.754    -.9784841    .7091399
      finc50 |  -.2620401   .4265936    -0.61   0.539    -1.098148    .5740681
      finc75 |  -.5662312   .4780035    -1.18   0.236    -1.503101    .3706385
     finc100 |  -.2278963   .4685942    -0.49   0.627    -1.146324    .6905316
     finc101 |  -.8641109   .5291111    -1.63   0.102     -1.90115    .1729279
    wealth89 |  -.0000956   .0003737    -0.26   0.798    -.0008279    .0006368
     prftshr |   .4817182   .2161233     2.23   0.026     .0581243     .905312
-------------+----------------------------------------------------------------
       _cut1 |  -3.087373   1.623765          (Ancillary parameters)
       _cut2 |  -2.053553   1.618611 
------------------------------------------------------------------------------

gen pclass=.
(226 missing values generated)

predict c1 if e(sample), outcome(0)
(option p assumed; predicted probability)
(32 missing values generated)

predict c2 if e(sample), outcome(50)
(option p assumed; predicted probability)
(32 missing values generated)

predict c3 if e(sample), outcome(100)
(option p assumed; predicted probability)
(32 missing values generated)

egen atest = rmax(c1 c2 c3)
(32 missing values generated)

foreach n of numlist 1/3 {
	replace pclass = `n' if c`n' ==atest
		}
(97 real changes made)
(113 real changes made)
(80 real changes made)

tab pclass pctstck if e(sample)

           | 0=mstbnds,50=mixed,100=mststcks
    pclass |         0         50        100 |     Total
-----------+---------------------------------+----------
         1 |        33         21         11 |        65 
         2 |        25         31         25 |        81 
         3 |         6         20         22 |        48 
-----------+---------------------------------+----------
     Total |        64         72         58 |       194 


di (33+31+22)/194
.44329897

di 33/64
.515625

di 31/72
.43055556

di 22/58
.37931034