Logistic Regression with StataChapter 6 – Conditional Logistic Regression

NOTE: This page is under construction!!

Intro paragraph needed!!!!!

5.1 Conditional Logistic Regression

There are two alternative approaches to maximum likelihood estimation in logistic regression, the unconditional estimation approach and the conditional estimation approach. In the previous chapters we have made use of the unconditional estimation approach. Unconditional estimation works best when the number of degrees of free for the model is small relative to the number of observation. When the degrees of freedom for the model becomes large relative to the number of cases then the conditional estimation approach is better.

Commonly, the model degrees of freedom become large when some type or matching is involved. Matching can include one-to-one (1:1) matching, one-to-k (1:k) matching and even matching subjects to themselves in a repeated measures design. The Stata command clogit, for conditional logistic regression, can be used for these situations.

To illustrate clogit, we will use a variant of the high school and beyond dataset. In hsbcl, students in honors composition (honcomp) are randomly matched with a non-honors composition student based on gender (female) and program type (prog). The variable pid is the pair ID and used to indicate which students are being matched.

use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl
describe
 
Contains data from hsbcl.dta
  obs:           106                          highschool & beyond
 vars:            14                          26 Jun 2001 10:57
 size:         6,360 (99.1% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              float  %9.0g                  
female          float  %9.0g       fl         
race            float  %12.0g      rl         
ses             float  %9.0g       sl         
hises           float  %9.0g                  
prog            float  %9.0g       sel        type of program
academic        float  %9.0g                  
read            float  %9.0g                  reading score
write           float  %9.0g                  writing score
math            float  %9.0g                  math score
science         float  %9.0g                  science score
socst           float  %9.0g                  social studies score
honcomp         float  %9.0g                  
pid             float  %9.0g                  
-------------------------------------------------------------------------------

5.1.1 Ordinary Logistic Regression

We will begin by running an ordinary logistic regression, i.e., one that does not take into account the matching that was done. The covariates (predictors) will be read and math.

logit honcomp read math

Logit estimates                                   Number of obs   =        106
                                                  LR chi2(2)      =      40.72
                                                  Prob > chi2     =     0.0000
Log likelihood = -53.115869                       Pseudo R2       =     0.2771

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .0647649   .0332232     1.95   0.051    -.0003515    .1298812
        math |   .1215842   .0357131     3.40   0.001     .0515879    .1915805
       _cons |  -10.45958    2.13368    -4.90   0.000    -14.64151   -6.277639
------------------------------------------------------------------------------

Notice that the likelihood ratio chi-square has two degrees of freedom and that math is highly significant while read is barely over the .05 level.

Let’s check out the fitstat for this model.

fitstat

Measures of Fit for logit of honcomp

Log-Lik Intercept Only:      -73.474     Log-Lik Full Model:          -53.116
D(103):                      106.232     LR(2):                        40.715
                                         Prob > LR:                     0.000
McFadden's R2:                 0.277     McFadden's Adj R2:             0.236
Maximum Likelihood R2:         0.319     Cragg & Uhler's R2:            0.425
McKelvey and Zavoina's R2:     0.445     Efron's R2:                    0.326
Variance of y*:                5.924     Variance of error:             3.290
Count R2:                      0.736     Adj Count R2:                  0.472
AIC:                           1.059     AIC*n:                       112.232
BIC:                        -374.102     BIC':                        -31.389

We will be comparing these results to the ones obtained in the next model.

One way to try to model the matched nature of the data is to include the pairs as a predictor in the model. We can do this in Stata using xi.

xi: logit honcomp read math i.pid
i.pid             _Ipid_1-53          (naturally coded; _Ipid_1 omitted)

Logit estimates                                   Number of obs   =        106
                                                  LR chi2(54)     =      78.53
                                                  Prob > chi2     =     0.0163
Log likelihood = -34.207961                       Pseudo R2       =     0.5344

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .1316183   .0568454     2.32   0.021     .0202034    .2430332
        math |   .3156888   .0719129     4.39   0.000     .1747421    .4566354
     _Ipid_2 |   2.644259   5.685677     0.47   0.642    -8.499464    13.78798
     _Ipid_3 |   1.683836   4.469618     0.38   0.706    -7.076455    10.44413
     _Ipid_4 |   5.393423   4.898562     1.10   0.271    -4.207581    14.99443
     _Ipid_5 |   4.144025   4.445536     0.93   0.351    -4.569065    12.85711
     _Ipid_6 |   2.657129    7.54067     0.35   0.725    -12.12231    17.43657
     _Ipid_7 |   8.353859   4.923587     1.70   0.090    -1.296195    18.00391
     _Ipid_8 |  -.0262261   4.422338    -0.00   0.995     -8.69385    8.641398
     _Ipid_9 |   .0524522   11.19484     0.00   0.996    -21.88904    21.99394
    _Ipid_10 |   2.512641   4.419353     0.57   0.570    -6.149132    11.17441
    _Ipid_11 |  -2.328571   4.713814    -0.49   0.621    -11.56748    6.910335
    _Ipid_12 |   1.763002   6.856181     0.26   0.797    -11.67487    15.20087
    _Ipid_13 |    3.31522   4.873365     0.68   0.496    -6.236401    12.86684
    _Ipid_14 |   .8946142   4.386668     0.20   0.838    -7.703097    9.492325
    _Ipid_15 |   4.117799   5.281313     0.78   0.436    -6.233384    14.46898
    _Ipid_16 |   -1.89462   4.478988    -0.42   0.672    -10.67328    6.884034
    _Ipid_17 |   4.406774   4.535259     0.97   0.331    -4.482171    13.29572
    _Ipid_18 |   1.815454   10.13918     0.18   0.858    -18.05698    21.68789
    _Ipid_19 |   4.867926   5.604981     0.87   0.385    -6.117635    15.85349
    _Ipid_20 |   4.722463   4.589177     1.03   0.303    -4.272158    13.71708
    _Ipid_21 |   7.182652   6.930279     1.04   0.300    -6.400445    20.76575
    _Ipid_22 |   1.867907   34.54054     0.05   0.957    -65.83031    69.56613
    _Ipid_23 |   3.275149   7.036681     0.47   0.642    -10.51649    17.06679
    _Ipid_24 |   2.735807   5.601202     0.49   0.625    -8.242348    13.71396
    _Ipid_25 |   4.775403   5.583181     0.86   0.392     -6.16743    15.71824
    _Ipid_26 |  -.7239006   8.063399    -0.09   0.928    -16.52787    15.08007
    _Ipid_27 |   1.315207   4.674924     0.28   0.778    -7.847475    10.47789
    _Ipid_28 |   2.986174   9.776721     0.31   0.760    -16.17585    22.14819
    _Ipid_29 |  -.1321061   5.459525    -0.02   0.981    -10.83258    10.56837
    _Ipid_30 |  -1.868394   4.728345    -0.40   0.693    -11.13578    7.398992
    _Ipid_31 |   .5522115   5.293526     0.10   0.917    -9.822909    10.92733
    _Ipid_32 |   1.697193   6.284521     0.27   0.787    -10.62024    14.01463
    _Ipid_33 |  -.5393423   4.470319    -0.12   0.904    -9.301006    8.222322
    _Ipid_34 |    .553187   6.815574     0.08   0.935    -12.80509    13.91147
    _Ipid_35 |   4.195989     6.9446     0.60   0.546    -9.415177    17.80716
    _Ipid_36 |  -.5660562   4.463285    -0.13   0.899    -9.313933    8.181821
    _Ipid_37 |  -.5526993   4.747311    -0.12   0.907    -9.857259     8.75186
    _Ipid_38 |   2.196952   4.396783     0.50   0.617    -6.420583    10.81449
    _Ipid_39 |  -2.079179   5.531016    -0.38   0.707    -12.91977    8.761413
    _Ipid_40 |   .5522115   4.561023     0.12   0.904    -8.387229    9.491652
    _Ipid_41 |   2.986174   6.129965     0.49   0.626    -9.028337    15.00069
    _Ipid_42 |  -.5917945   8.294294    -0.07   0.943    -16.84831    15.66472
    _Ipid_43 |  -.0791661   4.652938    -0.02   0.986    -9.198757    9.040425
    _Ipid_44 |   1.341433   4.602993     0.29   0.771    -7.680268    10.36313
    _Ipid_45 |   1.775384   4.513903     0.39   0.694    -7.071705    10.62247
    _Ipid_46 |  -.2632366   4.443416    -0.06   0.953    -8.972173    8.445699
    _Ipid_47 |   1.920847   4.393384     0.44   0.662    -6.690027    10.53172
    _Ipid_48 |    1.35479   6.284713     0.22   0.829    -10.96302     13.6726
    _Ipid_49 |   7.787803   4.635252     1.68   0.093    -1.297125    16.87273
    _Ipid_50 |  -.3824735   4.679609    -0.08   0.935    -9.554338    8.789391
    _Ipid_51 |  -3.262768   4.439724    -0.73   0.462    -11.96447    5.438932
    _Ipid_52 |   1.736776   4.428688     0.39   0.695    -6.943294    10.41685
    _Ipid_53 |   1.512635   5.285563     0.29   0.775    -8.846879    11.87215
       _cons |  -26.75926   6.473695    -4.13   0.000    -39.44747   -14.07105
------------------------------------------------------------------------------

This time the degrees of freedom for the likelihood ratio chi-square is 54 and both read and math are statistically significant. These degrees of freedom are large relative to the number of observation (106). In this situation, there is concern that the estimates of the coefficients might not be unbiased. This is a case for the use of conditional logistic regression.

We will follow this model with the fitstat command.

fitstat

Measures of Fit for logit of honcomp

Log-Lik Intercept Only:      -73.474     Log-Lik Full Model:          -34.208
D(51):                        68.416     LR(54):                       78.531
                                         Prob > LR:                     0.016
McFadden's R2:                 0.534     McFadden's Adj R2:            -0.214
Maximum Likelihood R2:         0.523     Cragg & Uhler's R2:            0.698
McKelvey and Zavoina's R2:     0.737     Efron's R2:                    0.600
Variance of y*:               12.493     Variance of error:             3.290
Count R2:                      0.868     Adj Count R2:                  0.736
AIC:                           1.683     AIC*n:                       178.416
BIC:                        -169.419     BIC':                        173.294

Note for future reference that the deviance is 68.416 versus 106.232 for the previous model, AIC is 1.683 (vs 1.059) and BIC is -169.419 (vs -374.102). Only the AIC was better in the first model.

5.1.2 Conditional Logistic Regression

Next, we will estimate the model using Stata’s clogit command for conditional logistic regression.

clogit honcomp read math, group(pid)

Conditional (fixed-effects) logistic regression   Number of obs   =        106
                                                  LR chi2(2)      =      39.27
                                                  Prob > chi2     =     0.0000
Log likelihood =  -17.10398                       Pseudo R2       =     0.5344

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .0658091   .0401959     1.64   0.102    -.0129735    .1445918
        math |   .1578444   .0508504     3.10   0.002     .0581794    .2575094
------------------------------------------------------------------------------

As you can see the degrees of freedom are back down to two, math is significant while read is not. To view the results as odds ratios we can either run listcoef in which the column labeled e^b are the odds ratios or rerun clogit with the or option.

listcoef

clogit (N=106): Factor Change in Odds 

  Odds of: 1 vs 0

--------------------------------------------------
     honcomp |      b         z     P>|z|    e^b  
-------------+------------------------------------
        read |   0.06581    1.637   0.102   1.0680
        math |   0.15784    3.104   0.002   1.1710
--------------------------------------------------

clogit, or

Conditional (fixed-effects) logistic regression   Number of obs   =        106
                                                  LR chi2(2)      =      39.27
                                                  Prob > chi2     =     0.0000
Log likelihood =  -17.10398                       Pseudo R2       =     0.5344

------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   1.068023   .0429302     1.64   0.102     .9871103    1.155568
        math |   1.170984    .059545     3.10   0.002     1.059905    1.293704
------------------------------------------------------------------------------

We can follow up on this estimation with the fitstat command.

. fitstat

Measures of Fit for clogit of honcomp

Log-Lik Intercept Only:      -36.737     Log-Lik Full Model:          -17.104
D(51):                        34.208     LR(2):                        39.266
                                         Prob > LR:                     0.000
McFadden's R2:                 0.534     McFadden's Adj R2:             0.480
Maximum Likelihood R2:         0.523     Cragg & Uhler's R2:            0.698
Count R2:                      0.868     
AIC:                           0.721     AIC*n:                        38.208
BIC:                        -168.277     BIC':                        -31.325

In comparing this fitstat to the previous one, we see that the deviance is smaller (34.208 vs 68.416), the AIC is smaller (0.721 vs 1.683) and that the BIC is slightly smaller (-168.277 vs -169.419) suggesting that the conditional logistic estimation approach is probably better.

The bottom line is that the coefficients, odds ratios and fit statistics can be interpreted in the same way as for ordinary logistic regression.

5.1.3 1:k Matching

The previous example used 1:1 matching. We will now use an example of 1:k matching, specifically 1:2 matching. There will be two non-honors composition students matched with each honors student. Actually, we artificially generated the data for the second matching student so that we could show you that the analysis proceeds in the same manner as the previous one.

The dataset hsbcl2 has our made up example matched on gender (female) and program type (prog).

use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl2
describe

Contains data from hsbcl2.dta
  obs:           159                          highschool & beyond
 vars:             7                          26 Jun 2001 15:08
 size:         5,088 (99.2% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              float  %9.0g                  
female          float  %9.0g       fl         
prog            float  %9.0g       sel        type of program
read            float  %9.0g                  reading score
math            float  %9.0g                  math score
honcomp         float  %9.0g                  
pid             float  %9.0g                  
-------------------------------------------------------------------------------

We will begin with an ordinary logistic regression, once again, ignoring the matching. We will also obtain the odds ratios and the fit statistics.

logit honcomp read math

Logit estimates                                   Number of obs   =        147
                                                  LR chi2(2)      =      55.69
                                                  Prob > chi2     =     0.0000
Log likelihood = -68.253137                       Pseudo R2       =     0.2898

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |    .077983   .0310376     2.51   0.012     .0171504    .1388156
        math |   .1209197   .0330634     3.66   0.000     .0561166    .1857228
       _cons |  -11.72204   1.983719    -5.91   0.000    -15.61006   -7.834021
------------------------------------------------------------------------------

logit, or

Logit estimates                                   Number of obs   =        147
                                                  LR chi2(2)      =      55.69
                                                  Prob > chi2     =     0.0000
Log likelihood = -68.253137                       Pseudo R2       =     0.2898

------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   1.081104   .0335549     2.51   0.012     1.017298    1.148912
        math |   1.128534   .0373132     3.66   0.000     1.057721    1.204088
------------------------------------------------------------------------------

fitstat

Measures of Fit for logit of honcomp

Log-Lik Intercept Only:      -96.098     Log-Lik Full Model:          -68.253
D(144):                      136.506     LR(2):                        55.691
                                         Prob > LR:                     0.000
McFadden's R2:                 0.290     McFadden's Adj R2:             0.259
Maximum Likelihood R2:         0.315     Cragg & Uhler's R2:            0.432
McKelvey and Zavoina's R2:     0.464     Efron's R2:                    0.331
Variance of y*:                6.144     Variance of error:             3.290
Count R2:                      0.741     Adj Count R2:                  0.283
AIC:                           0.969     AIC*n:                       142.506
BIC:                        -582.116     BIC':                        -45.710

Now for the conditional logistic regression.

clogit honcomp read math, group(pid)

Conditional (fixed-effects) logistic regression   Number of obs   =        147
                                                  LR chi2(2)      =      59.54
                                                  Prob > chi2     =     0.0000
Log likelihood = -23.590129                       Pseudo R2       =     0.5579

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .0917047   .0426049     2.15   0.031     .0082006    .1752088
        math |   .1957582   .0571106     3.43   0.001     .0838235    .3076929
------------------------------------------------------------------------------

clogit, or

Conditional (fixed-effects) logistic regression   Number of obs   =        147
                                                  LR chi2(2)      =      59.54
                                                  Prob > chi2     =     0.0000
Log likelihood = -23.590129                       Pseudo R2       =     0.5579

------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   1.096041   .0466967     2.15   0.031     1.008234    1.191495
        math |   1.216233   .0694598     3.43   0.001     1.087437    1.360283
------------------------------------------------------------------------------


fitstat

Measures of Fit for clogit of honcomp

Log-Lik Intercept Only:      -53.361     Log-Lik Full Model:          -23.590
D(51):                        47.180     LR(2):                        59.541
                                         Prob > LR:                     0.000
McFadden's R2:                 0.558     McFadden's Adj R2:             0.520
Maximum Likelihood R2:         0.675     Cragg & Uhler's R2:            0.779
Count R2:                      0.868     
AIC:                           0.966     AIC*n:                        51.180
BIC:                        -155.305     BIC':                        -51.601

5.1.4 Conditional Logistic Regression using xtlogit

5.3 Probit Analysis

Logistic regression utilizing the logit transformation is not the only method for dealing with binary response variables. Probit regression analysis provides an alternative method. The differences between logistic and probit regression. The graph below displays both the logistic and probit probabilities for a sample with 200 observations.

As you can see, the two curves don’t really differ by very much. Since the two approaches are so similar, how should users decide to use one versus the other? Some disciplines have historically used probit for their data analyses. However, logistic regression does have several small advantages: 1) The exponentiated form of the coefficient is meaningful and interpretable as the odds ratio. This is not the case for probit coefficients. 2) At the present time, logistic regression have more tools for diagnostics and evaluation of models. With these points in mind, let’s work through an example.

5.3.1 Probit Regression Example

We will use the hsblog dataset in which honcomp, the response variable indicates whether students are in honors composition or not. We will begin with a logistic analysis.

use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsblog

logit honcomp read female

Logit estimates                                   Number of obs   =        200
                                                  LR chi2(2)      =      60.40
                                                  Prob > chi2     =     0.0000
Log likelihood =  -85.44372                       Pseudo R2       =     0.2612

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .1443657   .0233337     6.19   0.000     .0986325    .1900989
      female |   1.120926   .4081028     2.75   0.006      .321059    1.920793
       _cons |  -9.603365   1.426404    -6.73   0.000    -12.39906   -6.807665
------------------------------------------------------------------------------

predict p1

logit, or

Logit estimates                                   Number of obs   =        200
                                                  LR chi2(2)      =      60.40
                                                  Prob > chi2     =     0.0000
Log likelihood =  -85.44372                       Pseudo R2       =     0.2612

------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   1.155307   .0269576     6.19   0.000     1.103661    1.209369
      female |   3.067693   1.251934     2.75   0.006     1.378587    6.826368
------------------------------------------------------------------------------

listcoef

logit (N=200): Factor Change in Odds 

  Odds of: 1 vs 0

----------------------------------------------------------------------
     honcomp |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
        read |   0.14437    6.187   0.000   1.1553   4.3937    10.2529
      female |   1.12093    2.747   0.006   3.0677   1.7500     0.4992
----------------------------------------------------------------------

Both read (p-value .000) and female (p-value 0.006) are statistically significant. Now let’s try the equivalent analysis using the probit approach. Note, there is no equivalent to the logit, or option.

probit honcomp read female

Probit estimates                                  Number of obs   =        200
                                                  LR chi2(2)      =      61.31
                                                  Prob > chi2     =     0.0000
Log likelihood = -84.990569                       Pseudo R2       =     0.2651

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .0856048   .0130065     6.58   0.000     .0601126    .1110971
      female |   .6340312   .2300876     2.76   0.006     .1830678    1.084995
       _cons |  -5.672047   .7798022    -7.27   0.000    -7.200431   -4.143663
------------------------------------------------------------------------------

predict p2

listcoef

probit (N=200): Unstandardized and Standardized Estimates 

 Observed SD: .4424407
   Latent SD: 1.3568531

-------------------------------------------------------------------------------
     honcomp |      b         z     P>|z|    bStdX    bStdY   bStdXY      SDofX
-------------+-----------------------------------------------------------------
        read |   0.08560    6.582   0.000   0.8777   0.0631   0.6469    10.2529
      female |   0.63403    2.756   0.006   0.3165   0.4673   0.2333     0.4992
-------------------------------------------------------------------------------

Note that although the coefficients themselves are different the p-values associated with them are, in this instance, the same. We can see how close the predicted probabilities are by comparing the first 30 predicted probabilities. Remember, p1 was obtained from logit and p2 from probit.

. list p1 p2 in 1/30

            p1         p2
  1.  .2018907   .2140139
  2.  .7915725   .7832198
  3.  .0372812   .0283618
  4.  .3755863   .3901446
  5.  .0563498   .0496128
  6.  .0372812   .0283618
  7.  .0843176   .0819907
  8.  .0090587    .002877
  9.  .3755863   .3901446
 10.  .2018907   .2140139
 11.  .2806132   .2960634
 12.  .2018907   .2140139
 13.  .7181559   .7180662
 14.  .1409265   .1470003
 15.  .0428232   .0343925
 16.  .0281952   .0189172
 17.  .0563498   .0496128
 18.  .2018907   .2140139
 19.  .5531741   .5592556
 20.  .1593262   .1675779
 21.  .3755863   .3901446
 22.  .3755863   .3901446
 23.  .0843176   .0819907
 24.  .2806132   .2960634
 25.  .0139005   .0061283
 26.  .0090587    .002877
 27.  .4453212    .457104
 28.  .0563498   .0496128
 29.  .0372812   .0283618
 30.  .1094523   .1111197

For the most part, the predicted probabilities a very close to one another.

5.4 Complementary Log Log Estimation

5.4.1 cloglog Example

We will stick with using the hsblog dataset. Let’s look at the same model that we used in the previous section, predicting honcomp using read and female.

. cloglog honcomp read female

Complementary log-log regression                Number of obs     =        200
                                                Zero outcomes     =        147
                                                Nonzero outcomes  =         53

                                                LR chi2(2)        =      59.34
Log likelihood = -85.976021                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .1094953   .0161081     6.80   0.000      .077924    .1410666
      female |   .8820051   .3065146     2.88   0.004     .2812476    1.482763
       _cons |  -7.783623   1.029657    -7.56   0.000    -9.801713   -5.765533
------------------------------------------------------------------------------

predict p3

listcoef

cloglog (N=200): Unstandardized and Standardized Estimates 

 Observed SD: .4424407

-------------------------------------------------------------
     honcomp |      b         z     P>|z|    bStdX      SDofX
-------------+-----------------------------------------------
        read |   0.10950    6.798   0.000   1.1226    10.2529
      female |   0.88201    2.878   0.004   0.4403     0.4992
-------------------------------------------------------------

Again, the coefficients are somewhat different from the previous analyses but the p-values are pretty close. We will take another look at the predicted probabilities.

. list p1 p2 p3 in 1/30

            p1         p2          p3
  1.  .2018907   .2140139   .19254605
  2.  .7915725   .7832198   .82146906
  3.  .0372812   .0283618   .05021318
  4.  .3755863   .3901446   .33803425
  5.  .0563498   .0496128   .06905129
  6.  .0372812   .0283618   .05021318
  7.  .0843176   .0819907   .09459662
  8.  .0090587    .002877   .01708784
  9.  .3755863   .3901446   .33803425
 10.  .2018907   .2140139   .19254605
 11.  .2806132   .2960634   .25698217
 12.  .2018907   .2140139   .19254605
 13.  .7181559   .7180662   .70861328
 14.  .1409265   .1470003   .14271825
 15.  .0428232   .0343925   .05585836
 16.  .0281952   .0189172     .040541
 17.  .0563498   .0496128   .06905129
 18.  .2018907   .2140139   .19254605
 19.  .5531741   .5592556    .5099445
 20.  .1593262   .1675779   .15785867
 21.  .3755863   .3901446   .33803425
 22.  .3755863   .3901446   .33803425
 23.  .0843176   .0819907   .09459662
 24.  .2806132   .2960634   .25698217
 25.  .0139005   .0061283   .02365356
 26.  .0090587    .002877   .01708784
 27.  .4453212    .457104   .40162623
 28.  .0563498   .0496128   .06905129
 29.  .0372812   .0283618   .05021318
 30.  .1094523   .1111197   .11635828

Again, many of the probabilities are very close to one another.

Let’s try another example. This time we need to create a variable scholar in which every student that scores 70 or higher on the math test is eligible for a partial scholarship. Only about 5% of the students qualify for eligibility.

generate scholar = math >= 70
 
tabualte scholar
 
     scholar |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        189       94.50       94.50
          1 |         11        5.50      100.00
------------+-----------------------------------
      Total |        200      100.00

In this model we will try to predict scholar using read and science.

cloglog scholar read science

Complementary log-log regression                Number of obs     =        200
                                                Zero outcomes     =        189
                                                Nonzero outcomes  =         11

                                                LR chi2(2)        =      31.83
Log likelihood = -26.679923                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     scholar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .1118547    .044388     2.52   0.012     .0248557    .1988537
     science |   .1445449   .0537736     2.69   0.007     .0391506    .2499392
       _cons |  -18.18081   3.801846    -4.78   0.000    -25.63229   -10.72933
------------------------------------------------------------------------------

listcoef

cloglog (N=200): Unstandardized and Standardized Estimates 

 Observed SD: .22855236

-------------------------------------------------------------
     scholar |      b         z     P>|z|    bStdX      SDofX
-------------+-----------------------------------------------
        read |   0.11185    2.520   0.012   1.1468    10.2529
     science |   0.14454    2.688   0.007   1.4311     9.9009
-------------------------------------------------------------