NOTE: This page is under construction!!
Intro paragraph needed!!!!!
5.1 Conditional Logistic Regression
There are two alternative approaches to maximum likelihood estimation in logistic regression, the unconditional estimation approach and the conditional estimation approach. In the previous chapters we have made use of the unconditional estimation approach. Unconditional estimation works best when the number of degrees of free for the model is small relative to the number of observation. When the degrees of freedom for the model becomes large relative to the number of cases then the conditional estimation approach is better.
Commonly, the model degrees of freedom become large when some type or matching is involved. Matching can include one-to-one (1:1) matching, one-to-k (1:k) matching and even matching subjects to themselves in a repeated measures design. The Stata command clogit, for conditional logistic regression, can be used for these situations.
To illustrate clogit, we will use a variant of the high school and beyond dataset. In hsbcl, students in honors composition (honcomp) are randomly matched with a non-honors composition student based on gender (female) and program type (prog). The variable pid is the pair ID and used to indicate which students are being matched.
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl
describe
Contains data from hsbcl.dta
obs: 106 highschool & beyond
vars: 14 26 Jun 2001 10:57
size: 6,360 (99.1% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
female float %9.0g fl
race float %12.0g rl
ses float %9.0g sl
hises float %9.0g
prog float %9.0g sel type of program
academic float %9.0g
read float %9.0g reading score
write float %9.0g writing score
math float %9.0g math score
science float %9.0g science score
socst float %9.0g social studies score
honcomp float %9.0g
pid float %9.0g
-------------------------------------------------------------------------------
5.1.1 Ordinary Logistic Regression
We will begin by running an ordinary logistic regression, i.e., one that does not take into account the matching that was done. The covariates (predictors) will be read and math.
logit honcomp read math
Logit estimates Number of obs = 106
LR chi2(2) = 40.72
Prob > chi2 = 0.0000
Log likelihood = -53.115869 Pseudo R2 = 0.2771
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0647649 .0332232 1.95 0.051 -.0003515 .1298812
math | .1215842 .0357131 3.40 0.001 .0515879 .1915805
_cons | -10.45958 2.13368 -4.90 0.000 -14.64151 -6.277639
------------------------------------------------------------------------------
Notice that the likelihood ratio chi-square has two degrees of freedom and that math is highly significant while read is barely over the .05 level.
Let’s check out the fitstat for this model.
fitstat
Measures of Fit for logit of honcomp
Log-Lik Intercept Only: -73.474 Log-Lik Full Model: -53.116
D(103): 106.232 LR(2): 40.715
Prob > LR: 0.000
McFadden's R2: 0.277 McFadden's Adj R2: 0.236
Maximum Likelihood R2: 0.319 Cragg & Uhler's R2: 0.425
McKelvey and Zavoina's R2: 0.445 Efron's R2: 0.326
Variance of y*: 5.924 Variance of error: 3.290
Count R2: 0.736 Adj Count R2: 0.472
AIC: 1.059 AIC*n: 112.232
BIC: -374.102 BIC': -31.389
We will be comparing these results to the ones obtained in the next model.
One way to try to model the matched nature of the data is to include the pairs as a predictor in the model. We can do this in Stata using xi.
xi: logit honcomp read math i.pid
i.pid _Ipid_1-53 (naturally coded; _Ipid_1 omitted)
Logit estimates Number of obs = 106
LR chi2(54) = 78.53
Prob > chi2 = 0.0163
Log likelihood = -34.207961 Pseudo R2 = 0.5344
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .1316183 .0568454 2.32 0.021 .0202034 .2430332
math | .3156888 .0719129 4.39 0.000 .1747421 .4566354
_Ipid_2 | 2.644259 5.685677 0.47 0.642 -8.499464 13.78798
_Ipid_3 | 1.683836 4.469618 0.38 0.706 -7.076455 10.44413
_Ipid_4 | 5.393423 4.898562 1.10 0.271 -4.207581 14.99443
_Ipid_5 | 4.144025 4.445536 0.93 0.351 -4.569065 12.85711
_Ipid_6 | 2.657129 7.54067 0.35 0.725 -12.12231 17.43657
_Ipid_7 | 8.353859 4.923587 1.70 0.090 -1.296195 18.00391
_Ipid_8 | -.0262261 4.422338 -0.00 0.995 -8.69385 8.641398
_Ipid_9 | .0524522 11.19484 0.00 0.996 -21.88904 21.99394
_Ipid_10 | 2.512641 4.419353 0.57 0.570 -6.149132 11.17441
_Ipid_11 | -2.328571 4.713814 -0.49 0.621 -11.56748 6.910335
_Ipid_12 | 1.763002 6.856181 0.26 0.797 -11.67487 15.20087
_Ipid_13 | 3.31522 4.873365 0.68 0.496 -6.236401 12.86684
_Ipid_14 | .8946142 4.386668 0.20 0.838 -7.703097 9.492325
_Ipid_15 | 4.117799 5.281313 0.78 0.436 -6.233384 14.46898
_Ipid_16 | -1.89462 4.478988 -0.42 0.672 -10.67328 6.884034
_Ipid_17 | 4.406774 4.535259 0.97 0.331 -4.482171 13.29572
_Ipid_18 | 1.815454 10.13918 0.18 0.858 -18.05698 21.68789
_Ipid_19 | 4.867926 5.604981 0.87 0.385 -6.117635 15.85349
_Ipid_20 | 4.722463 4.589177 1.03 0.303 -4.272158 13.71708
_Ipid_21 | 7.182652 6.930279 1.04 0.300 -6.400445 20.76575
_Ipid_22 | 1.867907 34.54054 0.05 0.957 -65.83031 69.56613
_Ipid_23 | 3.275149 7.036681 0.47 0.642 -10.51649 17.06679
_Ipid_24 | 2.735807 5.601202 0.49 0.625 -8.242348 13.71396
_Ipid_25 | 4.775403 5.583181 0.86 0.392 -6.16743 15.71824
_Ipid_26 | -.7239006 8.063399 -0.09 0.928 -16.52787 15.08007
_Ipid_27 | 1.315207 4.674924 0.28 0.778 -7.847475 10.47789
_Ipid_28 | 2.986174 9.776721 0.31 0.760 -16.17585 22.14819
_Ipid_29 | -.1321061 5.459525 -0.02 0.981 -10.83258 10.56837
_Ipid_30 | -1.868394 4.728345 -0.40 0.693 -11.13578 7.398992
_Ipid_31 | .5522115 5.293526 0.10 0.917 -9.822909 10.92733
_Ipid_32 | 1.697193 6.284521 0.27 0.787 -10.62024 14.01463
_Ipid_33 | -.5393423 4.470319 -0.12 0.904 -9.301006 8.222322
_Ipid_34 | .553187 6.815574 0.08 0.935 -12.80509 13.91147
_Ipid_35 | 4.195989 6.9446 0.60 0.546 -9.415177 17.80716
_Ipid_36 | -.5660562 4.463285 -0.13 0.899 -9.313933 8.181821
_Ipid_37 | -.5526993 4.747311 -0.12 0.907 -9.857259 8.75186
_Ipid_38 | 2.196952 4.396783 0.50 0.617 -6.420583 10.81449
_Ipid_39 | -2.079179 5.531016 -0.38 0.707 -12.91977 8.761413
_Ipid_40 | .5522115 4.561023 0.12 0.904 -8.387229 9.491652
_Ipid_41 | 2.986174 6.129965 0.49 0.626 -9.028337 15.00069
_Ipid_42 | -.5917945 8.294294 -0.07 0.943 -16.84831 15.66472
_Ipid_43 | -.0791661 4.652938 -0.02 0.986 -9.198757 9.040425
_Ipid_44 | 1.341433 4.602993 0.29 0.771 -7.680268 10.36313
_Ipid_45 | 1.775384 4.513903 0.39 0.694 -7.071705 10.62247
_Ipid_46 | -.2632366 4.443416 -0.06 0.953 -8.972173 8.445699
_Ipid_47 | 1.920847 4.393384 0.44 0.662 -6.690027 10.53172
_Ipid_48 | 1.35479 6.284713 0.22 0.829 -10.96302 13.6726
_Ipid_49 | 7.787803 4.635252 1.68 0.093 -1.297125 16.87273
_Ipid_50 | -.3824735 4.679609 -0.08 0.935 -9.554338 8.789391
_Ipid_51 | -3.262768 4.439724 -0.73 0.462 -11.96447 5.438932
_Ipid_52 | 1.736776 4.428688 0.39 0.695 -6.943294 10.41685
_Ipid_53 | 1.512635 5.285563 0.29 0.775 -8.846879 11.87215
_cons | -26.75926 6.473695 -4.13 0.000 -39.44747 -14.07105
------------------------------------------------------------------------------
This time the degrees of freedom for the likelihood ratio chi-square is 54 and both read and math are statistically significant. These degrees of freedom are large relative to the number of observation (106). In this situation, there is concern that the estimates of the coefficients might not be unbiased. This is a case for the use of conditional logistic regression.
We will follow this model with the fitstat command.
fitstat
Measures of Fit for logit of honcomp
Log-Lik Intercept Only: -73.474 Log-Lik Full Model: -34.208
D(51): 68.416 LR(54): 78.531
Prob > LR: 0.016
McFadden's R2: 0.534 McFadden's Adj R2: -0.214
Maximum Likelihood R2: 0.523 Cragg & Uhler's R2: 0.698
McKelvey and Zavoina's R2: 0.737 Efron's R2: 0.600
Variance of y*: 12.493 Variance of error: 3.290
Count R2: 0.868 Adj Count R2: 0.736
AIC: 1.683 AIC*n: 178.416
BIC: -169.419 BIC': 173.294
Note for future reference that the deviance is 68.416 versus 106.232 for the previous model, AIC is 1.683 (vs 1.059) and BIC is -169.419 (vs -374.102). Only the AIC was better in the first model.
5.1.2 Conditional Logistic Regression
Next, we will estimate the model using Stata’s clogit command for conditional logistic regression.
clogit honcomp read math, group(pid)
Conditional (fixed-effects) logistic regression Number of obs = 106
LR chi2(2) = 39.27
Prob > chi2 = 0.0000
Log likelihood = -17.10398 Pseudo R2 = 0.5344
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0658091 .0401959 1.64 0.102 -.0129735 .1445918
math | .1578444 .0508504 3.10 0.002 .0581794 .2575094
------------------------------------------------------------------------------
As you can see the degrees of freedom are back down to two, math is significant while read is not. To view the results as odds ratios we can either run listcoef in which the column labeled e^b are the odds ratios or rerun clogit with the or option.
listcoef
clogit (N=106): Factor Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
honcomp | b z P>|z| e^b
-------------+------------------------------------
read | 0.06581 1.637 0.102 1.0680
math | 0.15784 3.104 0.002 1.1710
--------------------------------------------------
clogit, or
Conditional (fixed-effects) logistic regression Number of obs = 106
LR chi2(2) = 39.27
Prob > chi2 = 0.0000
Log likelihood = -17.10398 Pseudo R2 = 0.5344
------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.068023 .0429302 1.64 0.102 .9871103 1.155568
math | 1.170984 .059545 3.10 0.002 1.059905 1.293704
------------------------------------------------------------------------------
We can follow up on this estimation with the fitstat command.
. fitstat
Measures of Fit for clogit of honcomp
Log-Lik Intercept Only: -36.737 Log-Lik Full Model: -17.104
D(51): 34.208 LR(2): 39.266
Prob > LR: 0.000
McFadden's R2: 0.534 McFadden's Adj R2: 0.480
Maximum Likelihood R2: 0.523 Cragg & Uhler's R2: 0.698
Count R2: 0.868
AIC: 0.721 AIC*n: 38.208
BIC: -168.277 BIC': -31.325
In comparing this fitstat to the previous one, we see that the deviance is smaller (34.208 vs 68.416), the AIC is smaller (0.721 vs 1.683) and that the BIC is slightly smaller (-168.277 vs -169.419) suggesting that the conditional logistic estimation approach is probably better.
The bottom line is that the coefficients, odds ratios and fit statistics can be interpreted in the same way as for ordinary logistic regression.
5.1.3 1:k Matching
The previous example used 1:1 matching. We will now use an example of 1:k matching, specifically 1:2 matching. There will be two non-honors composition students matched with each honors student. Actually, we artificially generated the data for the second matching student so that we could show you that the analysis proceeds in the same manner as the previous one.
The dataset hsbcl2 has our made up example matched on gender (female) and program type (prog).
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl2
describe
Contains data from hsbcl2.dta
obs: 159 highschool & beyond
vars: 7 26 Jun 2001 15:08
size: 5,088 (99.2% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
female float %9.0g fl
prog float %9.0g sel type of program
read float %9.0g reading score
math float %9.0g math score
honcomp float %9.0g
pid float %9.0g
-------------------------------------------------------------------------------
We will begin with an ordinary logistic regression, once again, ignoring the matching. We will also obtain the odds ratios and the fit statistics.
logit honcomp read math
Logit estimates Number of obs = 147
LR chi2(2) = 55.69
Prob > chi2 = 0.0000
Log likelihood = -68.253137 Pseudo R2 = 0.2898
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .077983 .0310376 2.51 0.012 .0171504 .1388156
math | .1209197 .0330634 3.66 0.000 .0561166 .1857228
_cons | -11.72204 1.983719 -5.91 0.000 -15.61006 -7.834021
------------------------------------------------------------------------------
logit, or
Logit estimates Number of obs = 147
LR chi2(2) = 55.69
Prob > chi2 = 0.0000
Log likelihood = -68.253137 Pseudo R2 = 0.2898
------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.081104 .0335549 2.51 0.012 1.017298 1.148912
math | 1.128534 .0373132 3.66 0.000 1.057721 1.204088
------------------------------------------------------------------------------
fitstat
Measures of Fit for logit of honcomp
Log-Lik Intercept Only: -96.098 Log-Lik Full Model: -68.253
D(144): 136.506 LR(2): 55.691
Prob > LR: 0.000
McFadden's R2: 0.290 McFadden's Adj R2: 0.259
Maximum Likelihood R2: 0.315 Cragg & Uhler's R2: 0.432
McKelvey and Zavoina's R2: 0.464 Efron's R2: 0.331
Variance of y*: 6.144 Variance of error: 3.290
Count R2: 0.741 Adj Count R2: 0.283
AIC: 0.969 AIC*n: 142.506
BIC: -582.116 BIC': -45.710
Now for the conditional logistic regression.
clogit honcomp read math, group(pid)
Conditional (fixed-effects) logistic regression Number of obs = 147
LR chi2(2) = 59.54
Prob > chi2 = 0.0000
Log likelihood = -23.590129 Pseudo R2 = 0.5579
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0917047 .0426049 2.15 0.031 .0082006 .1752088
math | .1957582 .0571106 3.43 0.001 .0838235 .3076929
------------------------------------------------------------------------------
clogit, or
Conditional (fixed-effects) logistic regression Number of obs = 147
LR chi2(2) = 59.54
Prob > chi2 = 0.0000
Log likelihood = -23.590129 Pseudo R2 = 0.5579
------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.096041 .0466967 2.15 0.031 1.008234 1.191495
math | 1.216233 .0694598 3.43 0.001 1.087437 1.360283
------------------------------------------------------------------------------
fitstat
Measures of Fit for clogit of honcomp
Log-Lik Intercept Only: -53.361 Log-Lik Full Model: -23.590
D(51): 47.180 LR(2): 59.541
Prob > LR: 0.000
McFadden's R2: 0.558 McFadden's Adj R2: 0.520
Maximum Likelihood R2: 0.675 Cragg & Uhler's R2: 0.779
Count R2: 0.868
AIC: 0.966 AIC*n: 51.180
BIC: -155.305 BIC': -51.601
5.1.4 Conditional Logistic Regression using xtlogit
5.3 Probit Analysis
Logistic regression utilizing the logit transformation is not the only method for dealing with binary response variables. Probit regression analysis provides an alternative method. The differences between logistic and probit regression. The graph below displays both the logistic and probit probabilities for a sample with 200 observations.
As you can see, the two curves don’t really differ by very much. Since the two approaches are so similar, how should users decide to use one versus the other? Some disciplines have historically used probit for their data analyses. However, logistic regression does have several small advantages: 1) The exponentiated form of the coefficient is meaningful and interpretable as the odds ratio. This is not the case for probit coefficients. 2) At the present time, logistic regression have more tools for diagnostics and evaluation of models. With these points in mind, let’s work through an example.
5.3.1 Probit Regression Example
We will use the hsblog dataset in which honcomp, the response variable indicates whether students are in honors composition or not. We will begin with a logistic analysis.
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsblog
logit honcomp read female
Logit estimates Number of obs = 200
LR chi2(2) = 60.40
Prob > chi2 = 0.0000
Log likelihood = -85.44372 Pseudo R2 = 0.2612
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .1443657 .0233337 6.19 0.000 .0986325 .1900989
female | 1.120926 .4081028 2.75 0.006 .321059 1.920793
_cons | -9.603365 1.426404 -6.73 0.000 -12.39906 -6.807665
------------------------------------------------------------------------------
predict p1
logit, or
Logit estimates Number of obs = 200
LR chi2(2) = 60.40
Prob > chi2 = 0.0000
Log likelihood = -85.44372 Pseudo R2 = 0.2612
------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.155307 .0269576 6.19 0.000 1.103661 1.209369
female | 3.067693 1.251934 2.75 0.006 1.378587 6.826368
------------------------------------------------------------------------------
listcoef
logit (N=200): Factor Change in Odds
Odds of: 1 vs 0
----------------------------------------------------------------------
honcomp | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------
read | 0.14437 6.187 0.000 1.1553 4.3937 10.2529
female | 1.12093 2.747 0.006 3.0677 1.7500 0.4992
----------------------------------------------------------------------
Both read (p-value .000) and female (p-value 0.006) are statistically significant. Now let’s try the equivalent analysis using the probit approach. Note, there is no equivalent to the logit, or option.
probit honcomp read female
Probit estimates Number of obs = 200
LR chi2(2) = 61.31
Prob > chi2 = 0.0000
Log likelihood = -84.990569 Pseudo R2 = 0.2651
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0856048 .0130065 6.58 0.000 .0601126 .1110971
female | .6340312 .2300876 2.76 0.006 .1830678 1.084995
_cons | -5.672047 .7798022 -7.27 0.000 -7.200431 -4.143663
------------------------------------------------------------------------------
predict p2
listcoef
probit (N=200): Unstandardized and Standardized Estimates
Observed SD: .4424407
Latent SD: 1.3568531
-------------------------------------------------------------------------------
honcomp | b z P>|z| bStdX bStdY bStdXY SDofX
-------------+-----------------------------------------------------------------
read | 0.08560 6.582 0.000 0.8777 0.0631 0.6469 10.2529
female | 0.63403 2.756 0.006 0.3165 0.4673 0.2333 0.4992
-------------------------------------------------------------------------------
Note that although the coefficients themselves are different the p-values associated with them are, in this instance, the same. We can see how close the predicted probabilities are by comparing the first 30 predicted probabilities. Remember, p1 was obtained from logit and p2 from probit.
. list p1 p2 in 1/30
p1 p2
1. .2018907 .2140139
2. .7915725 .7832198
3. .0372812 .0283618
4. .3755863 .3901446
5. .0563498 .0496128
6. .0372812 .0283618
7. .0843176 .0819907
8. .0090587 .002877
9. .3755863 .3901446
10. .2018907 .2140139
11. .2806132 .2960634
12. .2018907 .2140139
13. .7181559 .7180662
14. .1409265 .1470003
15. .0428232 .0343925
16. .0281952 .0189172
17. .0563498 .0496128
18. .2018907 .2140139
19. .5531741 .5592556
20. .1593262 .1675779
21. .3755863 .3901446
22. .3755863 .3901446
23. .0843176 .0819907
24. .2806132 .2960634
25. .0139005 .0061283
26. .0090587 .002877
27. .4453212 .457104
28. .0563498 .0496128
29. .0372812 .0283618
30. .1094523 .1111197
For the most part, the predicted probabilities a very close to one another.
5.4 Complementary Log Log Estimation
5.4.1 cloglog Example
We will stick with using the hsblog dataset. Let’s look at the same model that we used in the previous section, predicting honcomp using read and female.
. cloglog honcomp read female
Complementary log-log regression Number of obs = 200
Zero outcomes = 147
Nonzero outcomes = 53
LR chi2(2) = 59.34
Log likelihood = -85.976021 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .1094953 .0161081 6.80 0.000 .077924 .1410666
female | .8820051 .3065146 2.88 0.004 .2812476 1.482763
_cons | -7.783623 1.029657 -7.56 0.000 -9.801713 -5.765533
------------------------------------------------------------------------------
predict p3
listcoef
cloglog (N=200): Unstandardized and Standardized Estimates
Observed SD: .4424407
-------------------------------------------------------------
honcomp | b z P>|z| bStdX SDofX
-------------+-----------------------------------------------
read | 0.10950 6.798 0.000 1.1226 10.2529
female | 0.88201 2.878 0.004 0.4403 0.4992
-------------------------------------------------------------
Again, the coefficients are somewhat different from the previous analyses but the p-values are pretty close. We will take another look at the predicted probabilities.
. list p1 p2 p3 in 1/30
p1 p2 p3
1. .2018907 .2140139 .19254605
2. .7915725 .7832198 .82146906
3. .0372812 .0283618 .05021318
4. .3755863 .3901446 .33803425
5. .0563498 .0496128 .06905129
6. .0372812 .0283618 .05021318
7. .0843176 .0819907 .09459662
8. .0090587 .002877 .01708784
9. .3755863 .3901446 .33803425
10. .2018907 .2140139 .19254605
11. .2806132 .2960634 .25698217
12. .2018907 .2140139 .19254605
13. .7181559 .7180662 .70861328
14. .1409265 .1470003 .14271825
15. .0428232 .0343925 .05585836
16. .0281952 .0189172 .040541
17. .0563498 .0496128 .06905129
18. .2018907 .2140139 .19254605
19. .5531741 .5592556 .5099445
20. .1593262 .1675779 .15785867
21. .3755863 .3901446 .33803425
22. .3755863 .3901446 .33803425
23. .0843176 .0819907 .09459662
24. .2806132 .2960634 .25698217
25. .0139005 .0061283 .02365356
26. .0090587 .002877 .01708784
27. .4453212 .457104 .40162623
28. .0563498 .0496128 .06905129
29. .0372812 .0283618 .05021318
30. .1094523 .1111197 .11635828
Again, many of the probabilities are very close to one another.
Let’s try another example. This time we need to create a variable scholar in which every student that scores 70 or higher on the math test is eligible for a partial scholarship. Only about 5% of the students qualify for eligibility.
generate scholar = math >= 70
tabualte scholar
scholar | Freq. Percent Cum.
------------+-----------------------------------
0 | 189 94.50 94.50
1 | 11 5.50 100.00
------------+-----------------------------------
Total | 200 100.00
In this model we will try to predict scholar using read and science.
cloglog scholar read science
Complementary log-log regression Number of obs = 200
Zero outcomes = 189
Nonzero outcomes = 11
LR chi2(2) = 31.83
Log likelihood = -26.679923 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
scholar | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .1118547 .044388 2.52 0.012 .0248557 .1988537
science | .1445449 .0537736 2.69 0.007 .0391506 .2499392
_cons | -18.18081 3.801846 -4.78 0.000 -25.63229 -10.72933
------------------------------------------------------------------------------
listcoef
cloglog (N=200): Unstandardized and Standardized Estimates
Observed SD: .22855236
-------------------------------------------------------------
scholar | b z P>|z| bStdX SDofX
-------------+-----------------------------------------------
read | 0.11185 2.520 0.012 1.1468 10.2529
science | 0.14454 2.688 0.007 1.4311 9.9009
-------------------------------------------------------------
