Applied Regression Analysis by John Fox Chapter 15: Beyond Linear Least Squares

Figure 15.1 on page 440 on the data file chile.

use https://stats.idre.ucla.edu/stat/stata/examples/ara/chile, clear

gen voting=1 if (intvote==1)
replace voting = 0 if (intvote==2)
logistic voting statquo
predict pred, p
graph twoway (scatter voting statquo, jitter(1)) (lfit voting statquo) ///
	(lowess voting statquo, bwidth(.4)) (scatter pred statquo, connect(l) msymbol(i) sort), ///
	ylabel(0 .5 1)

Table 15.1 and 15.2 on page 452 using the data file womenlf.

use https://stats.idre.ucla.edu/stat/stata/examples/ara/womenlf, clear

* Generating a new work status variable and an interaction variable

gen ws = 1 if( workstat==1 |  workstat==2)
replace ws=0 if ( workstat==0)
gen ik = husbinc*chilpres

* Contrast between model 1 and 0 is the overall likelihood ratio of 
* model 1 and the p-value is the overall Prob > chi2

xi: logistic ws husbinc i.chilpres i.region ik /*model 1 */ 

(Some output omitted here.)

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(7)      =      39.61
                                                  Prob > chi2     =     0.0000
Log likelihood = -158.27076                       Pseudo R2       =     0.1112

(More output is omitted.)

lrtest, saving(m1)/* saving for further contrast test */

display -2*e(ll) /* displaying deviance */
316.54152

logistic ws /* model 0 */
display -2*e(ll)/* displaying deviance */
356.15089

xi: logistic ws husbinc i.chilpres i.region /* model 2 */
display -2*e(ll)
317.30107

lrtest, saving(m2)
lrtest, using(m1) /* contrast 2-1 */

Logistic:  likelihood-ratio test                      chi2(1)     =       0.76
                                                      Prob > chi2 =     0.3835

xi: logistic ws husbinc ik i.chilpres /* model 3 */

display -2*e(ll) 
319.12422

lrtest, using(m1)/* constrast 3-1 */

Logistic:  likelihood-ratio test                      chi2(4)     =       2.58
                                                      Prob > chi2 =     0.6299
                                                      
xi: logistic ws husbinc i.region /* model 4 */
display -2*e(ll)
347.84936

lrtest, using(m2)/* contrast 4-2 */

Logistic:  likelihood-ratio test                      chi2(1)     =      30.55
                                                      Prob > chi2 =     0.0000
                                                      
xi: logistic ws i.chilpres i.region /* model 5 */
display -2*e(ll)
322.4267

lrtest, using(m2)/* contrast 5-2 */

Logistic:  likelihood-ratio test                      chi2(1)     =       5.13
                                                      Prob > chi2 =     0.0236

Figure 15.4 and formula on page 453.

xi: logistic ws i.chilpres husbinc

i.chilpres            Ichilp_0-1   (naturally coded; Ichilp_0 omitted)

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(2)      =      36.42
                                                  Prob > chi2     =     0.0000
Log likelihood = -159.86627                       Pseudo R2       =     0.1023

------------------------------------------------------------------------------
      ws | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
Ichilp_1 |   .2068734   .0604614     -5.391   0.000       .1166622    .3668421
 husbinc |   .9585741   .0189607     -2.139   0.032       .9221229    .9964661
------------------------------------------------------------------------------

logit

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(2)      =      36.42
                                                  Prob > chi2     =     0.0000
Log likelihood = -159.86627                       Pseudo R2       =     0.1023

------------------------------------------------------------------------------
      ws |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
Ichilp_1 |  -1.575648   .2922629     -5.391   0.000      -2.148473   -1.002824
 husbinc |  -.0423084   .0197801     -2.139   0.032      -.0810767   -.0035401
   _cons |    1.33583   .3837632      3.481   0.000       .5836677    2.087992
------------------------------------------------------------------------------

predict pw, p
xi: regress ws i.chilpres husbinc

i.chilpres            Ichilp_0-1   (naturally coded; Ichilp_0 omitted)

  Source |       SS       df       MS                  Number of obs =     263
---------+------------------------------               F(  2,   260) =   20.43
   Model |  8.64321054     2  4.32160527               Prob > F      =  0.0000
Residual |  55.0069796   260  .211565306               R-squared     =  0.1358
---------+------------------------------               Adj R-squared =  0.1291
   Total |  63.6501901   262  .242939657               Root MSE      =  .45996

------------------------------------------------------------------------------
      ws |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
Ichilp_1 |  -.3673736    .061906     -5.934   0.000      -.4892745   -.2454727
 husbinc |  -.0085375   .0039351     -2.170   0.031      -.0162863   -.0007887
   _cons |   .7936535   .0766814     10.350   0.000       .6426578    .9446491
------------------------------------------------------------------------------

predict lw, xb
graph twoway (scatter pw lw husbinc if chilpres ==1, sort connect(l l) msymbol(i i) clcolor(black blue)) ///
	(scatter pw lw husbinc if chilpres ==0, sort connect(l l) msymbol(i i) clcolor(black blue)), ///
	ttext(0.15 10 "Children Present") ttext(0.75 30 "Children Absent") ///
	ylabel(0(.25)1)

Figure 15.5 on page 459. We can’t get the exact lowess smooth as in the book as it needs more than one iterative reweighting steps as there are outliers in the data and Stata’s command lowess does not have an option on that yet. See the corresponding part of SAS for a better lowess smoothing result.

logistic ws chilpres husbinc
predict prob
gen par=(ws-prob)/(prob*(1-prob))-.0423*husbinc /*creating partial residual*/
graph twoway (scatter par husbinc) (lowess par husbinc, bwidth(.9)) (lfit par husbinc), ///
		ylabel(-5 0 5)

Figure 15.6 on page 461. Notice that in Stata all the diagnostic statistics for logistic regression are adjusted for the number of covariate patterns, so called m-asymptotic instead of n-asymptotic, i.e., for the number of observations. So we have to do some calculations here in order to get the results in the text book.

logistic ws chilpres husbinc
predict yhat /*the predicted value*/
gen pr=(ws-yhat)/sqrt(yhat*(1-yhat)) /* generating Pearson residual on a case by case basis */
predict myhat, hat
predict pattern, number
egen count=count(obs), by(pattern) /*number of obs sharing a c.p.*/
gen nhat=myhat/count /*hat diagonal on per observation basis*/
gen sr = pr/sqrt(1-nhat)/*studentized pearson residual*/
graph twoway scatter sr nhat, ylabel(-2(2)4) yline(-2 0 2) xlabel(0(.01).06) xline(0.0228 .034)

Figure 15.7 (a) and (b) on page 462. Stata does not have built-in command for Dfbeta. We use formula [15.21] to create the required statistics for these figures. This example is continuation of the previous one.

gen id=1 
set matsize 300
mkmat chilpres husbinc id, matrix(X)
matrix d=e(V)*X'
matrix dt=d'
svmat dt, name(mydf)
gen md2=mydf2*(ws-yhat)/(1-nhat)/*husbincDfbeta*/
graph twoway scatter md2 obs, ylab(-0.005(0.005)0.01) yline(0) xlabel(0(50)300)

gen md1=mydf1*(ws-yhat)/(1-nhat) /*kidDfbeta*/
graph twoway scatter md1 obs, ylabel(-0.04(0.04)0.08) yline(0) xlab(0(50)300)

Calculation and Figure 15.8 on page 468 and 469.

mlogit workstat husbinc chilpre, base(0)

Iteration 0:   log likelihood = -250.24628
Iteration 1:   log likelihood = -214.24438
Iteration 2:   log likelihood =   -211.495
Iteration 3:   log likelihood = -211.44102
Iteration 4:   log likelihood = -211.44096

Multinomial regression                            Number of obs   =        263
                                                  LR chi2(4)      =      77.61
                                                  Prob > chi2     =     0.0000
Log likelihood = -211.44096                       Pseudo R2       =     0.1551

------------------------------------------------------------------------------
workstat |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
parttime |
 husbinc |   .0068921   .0234548      0.294   0.769      -.0390784    .0528627
chilpres |   .0214911   .4690366      0.046   0.963      -.8978037     .940786
   _cons |  -1.432307   .5924623     -2.418   0.016      -2.593512   -.2711023
---------+--------------------------------------------------------------------
fulltime |
 husbinc |  -.0972307   .0280958     -3.461   0.001      -.1522975   -.0421639
chilpres |  -2.558595    .362199     -7.064   0.000      -3.268492   -1.848698
   _cons |   1.982822   .4841771      4.095   0.000       1.033853    2.931792
------------------------------------------------------------------------------
(Outcome workstat==not work is the comparison group)

predict p0 p1 p2
graph twoway (line p0 husbinc if chilpres==1, sort) (line p1 husbinc if chilpres==1, sort) ///
	       (line p2 husbinc if chilpres==1, sort) , ylabel(0(.25).75) ///
		ttext(0.75 5 "Children Present") ttext(0.70 30 "Not Working") ///
		ttext(0.25 8.5 "Full-Time") ttext(0.20 35 "Part-Time")

graph twoway (line p0 husbinc if chilpres==0, sort) (line p1 husbinc if chilpres==0, sort) ///
	       (line p2 husbinc if chilpres==0, sort) , ylabel(0(.25)1) ///
		ttext(1 5 "Children Absent") ttext(0.50 30 "Not Working") ///
		ttext(0.80 8.5 "Full-Time") ttext(0.05 17.5 "Part-Time")

Calculation on page 473.

gen wk=(workstat==0)
logistic wk husbinc chilpres

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(2)      =      36.42
                                                  Prob > chi2     =     0.0000
Log likelihood = -159.86627                       Pseudo R2       =     0.1023

------------------------------------------------------------------------------
      wk | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 husbinc |   1.043216   .0206349      2.139   0.032       1.003546    1.084454
chilpres |   4.833875   1.412762      5.391   0.000       2.725968     8.57176
------------------------------------------------------------------------------

logit

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(2)      =      36.42
                                                  Prob > chi2     =     0.0000
Log likelihood = -159.86627                       Pseudo R2       =     0.1023

------------------------------------------------------------------------------
      wk |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 husbinc |   .0423084   .0197801      2.139   0.032       .0035401    .0810767
chilpres |   1.575648   .2922629      5.391   0.000       1.002824    2.148473
   _cons |   -1.33583   .3837632     -3.481   0.000      -2.087992   -.5836677
------------------------------------------------------------------------------

lrtest, saving(0)
logistic wk chilpres

Logit estimates                                   Number of obs   =        263
                                                  LR chi2(1)      =      31.59
                                                  Prob > chi2     =     0.0000
Log likelihood = -162.27945                       Pseudo R2       =     0.0887

------------------------------------------------------------------------------
      wk | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
chilpres |   4.781119   1.379609      5.422   0.000        2.71589    8.416797
------------------------------------------------------------------------------

lrtest /*test on the bottom of page 473*/

Logistic:  likelihood-ratio test                      chi2(1)     =       4.83
                                                      Prob > chi2 =     0.0280

gen ptime=.
replace ptime = 0 if (workstat==1)
replace ptime=1 if(workstat==2)
logistic ptime husbinc chilpres

Logit estimates                                   Number of obs   =        108
                                                  LR chi2(2)      =      39.85
                                                  Prob > chi2     =     0.0000
Log likelihood = -52.247423                       Pseudo R2       =     0.2761

------------------------------------------------------------------------------
   ptime | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 husbinc |    .898285   .0351699     -2.740   0.006       .8319318    .9699305
chilpres |   .0705484   .0381719     -4.900   0.000       .0244301    .2037278
------------------------------------------------------------------------------

logit

Logit estimates                                   Number of obs   =        108
                                                  LR chi2(2)      =      39.85
                                                  Prob > chi2     =     0.0000
Log likelihood = -52.247423                       Pseudo R2       =     0.2761

------------------------------------------------------------------------------
   ptime |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 husbinc |  -.1072679   .0391522     -2.740   0.006      -.1840048   -.0305309
chilpres |  -2.651456   .5410738     -4.900   0.000      -3.711941    -1.59097
   _cons |   3.477773   .7671069      4.534   0.000       1.974272    4.981275
------------------------------------------------------------------------------

lrtest, saving(p1)
logistic ptime chilpres

Logit estimates                                   Number of obs   =        108
                                                  LR chi2(1)      =      30.87
                                                  Prob > chi2     =     0.0000
Log likelihood = -56.738094                       Pseudo R2       =     0.2138

------------------------------------------------------------------------------
   ptime | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
chilpres |   .0869565     .04288     -4.953   0.000       .0330794    .2285846
------------------------------------------------------------------------------

lrtest, using(p1)/*test on the top of page 474*/

Logistic:  likelihood-ratio test                      chi2(1)     =       8.98
                                                      Prob > chi2 =     0.0027

Calculation on page 477.

xi: ologit workstat husbinc i.chilpres 

i.chilpres            Ichilp_0-1   (naturally coded; Ichilp_0 omitted)

Iteration 0:   log likelihood = -250.24628
Iteration 1:   log likelihood = -221.36758
Iteration 2:   log likelihood = -220.83242
Iteration 3:   log likelihood = -220.83148

Ordered logit estimates                           Number of obs   =        263
                                                  LR chi2(2)      =      58.83
                                                  Prob > chi2     =     0.0000
Log likelihood = -220.83148                       Pseudo R2       =     0.1175

------------------------------------------------------------------------------
workstat |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 husbinc |  -.0539007     .01949     -2.766   0.006      -.0921004   -.0157009
Ichilp_1 |  -1.971957   .2869478     -6.872   0.000      -2.534364    -1.40955
---------+--------------------------------------------------------------------
   _cut1 |  -1.852037   .3862995             (Ancillary parameters)
   _cut2 |  -.9409253   .3699303  
------------------------------------------------------------------------------

The analysis in this section is based on table 15.3, which is based on data from an example from The American Voter (Campbell, et al., 1960). The first data set we create here is based on table 15.3 for figure 15.3 (page 480).

input logv1 logvc inten
847  .9   0
904 1.318 1
981 2.084 2
end
label define scale 0 Weak 1 Medium 2 Strong
label values inten scale
label variable logv1 "One-Sided"
label variable logvc "Close"
graph twoway scatter logv1 logvc inten, connect(l l) xlabel(0 1 2)

Now we create another data set below to do the logistic regression and tests over different models. Thus we will have table 15.4 and table 15.5 on page 482.

input perclose inten1 inten2 voted wv
0 0 0 1 91
0 0 0 0 39
0 1 0 1 121
0 1 0 0 49
0 0 1 1 64
0 0 1 0 24
1 0 0 1 214
1 0 0 0 87
1 1 0 1 284
1 1 0 0 76
1 0 1 1 201
1 0 1 0 25
end

gen  clspref1=perclose*inten1
gen  clspref2=perclose*inten2

logistic voted perclose inten1 inten2 clspref1 clspref2 [fweight=wv]
(Output omitted.)

di -2*e(ll) /*deviance for model 1*/
1356.4343

lrtest, saving(t1)
logistic voted perclose inten1 inten2 [fweight=wv]
(Output omitted.)

di -2*e(ll) /*model 2*/
1363.5529

lrtest, saving(t2)
lrtest, using(t1)/*contrast 2-1*/

Logistic:  likelihood-ratio test                      chi2(2)     =       7.12
                                                      Prob > chi2 =     0.0285
                                                      
logistic voted perclose clspref1 clspref2 [fweight=wv]
(Output omitted.)

di -2*e(ll) /*model 3 */
1356.6253
lrtest, saving(t3)
lrtest, using(t1) /*contrast 3-1 */

Logistic:  likelihood-ratio test                      chi2(2)     =       0.19
                                                      Prob > chi2 =     0.9089
                                                      
logistic voted inten1 inten2 clspref1 clspref2 [fweight=wv]
(Output omitted.)

di -2*e(ll) /*model 4*/
1356.4869

lrtest, using(t1)

Logistic:  likelihood-ratio test                      chi2(1)     =       0.05
                                                      Prob > chi2 =     0.8186
                                                      
logistic voted perclose [fweight=wv]
(Output omitted.)

di -2*e(ll) /*model 5 */
1382.6582

lrtest, using(t2)

Logistic:  likelihood-ratio test                      chi2(2)     =      19.11
                                                      Prob > chi2 =     0.0001
                                                      
logistic voted inten1 inten2 [fweight=wv]
(Output omitted.)

di -2*e(ll) /*model 6*/
1371.8382

lrtest, using(t2)

Logistic:  likelihood-ratio test                      chi2(1)     =       8.29
                                                      Prob > chi2 =     0.0040