NOTE: This page is under construction!!
Intro paragraph needed!!!!!
5.1 Conditional Logistic Regression
There are two alternative approaches to maximum likelihood estimation in logistic regression, the unconditional estimation approach and the conditional estimation approach. In the previous chapters we have made use of the unconditional estimation approach. Unconditional estimation works best when the number of degrees of free for the model is small relative to the number of observation. When the degrees of freedom for the model becomes large relative to the number of cases then the conditional estimation approach is better.
Commonly, the model degrees of freedom become large when some type or matching is involved. Matching can include one-to-one (1:1) matching, one-to-k (1:k) matching and even matching subjects to themselves in a repeated measures design. The Stata command clogit, for conditional logistic regression, can be used for these situations.
To illustrate clogit, we will use a variant of the high school and beyond dataset. In hsbcl, students in honors composition (honcomp) are randomly matched with a non-honors composition student based on gender (female) and program type (prog). The variable pid is the pair ID and used to indicate which students are being matched.
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl describe Contains data from hsbcl.dta obs: 106 highschool & beyond vars: 14 26 Jun 2001 10:57 size: 6,360 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl hises float %9.0g prog float %9.0g sel type of program academic float %9.0g read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score honcomp float %9.0g pid float %9.0g -------------------------------------------------------------------------------
5.1.1 Ordinary Logistic Regression
We will begin by running an ordinary logistic regression, i.e., one that does not take into account the matching that was done. The covariates (predictors) will be read and math.
logit honcomp read math Logit estimates Number of obs = 106 LR chi2(2) = 40.72 Prob > chi2 = 0.0000 Log likelihood = -53.115869 Pseudo R2 = 0.2771 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0647649 .0332232 1.95 0.051 -.0003515 .1298812 math | .1215842 .0357131 3.40 0.001 .0515879 .1915805 _cons | -10.45958 2.13368 -4.90 0.000 -14.64151 -6.277639 ------------------------------------------------------------------------------
Notice that the likelihood ratio chi-square has two degrees of freedom and that math is highly significant while read is barely over the .05 level.
Let’s check out the fitstat for this model.
fitstat Measures of Fit for logit of honcomp Log-Lik Intercept Only: -73.474 Log-Lik Full Model: -53.116 D(103): 106.232 LR(2): 40.715 Prob > LR: 0.000 McFadden's R2: 0.277 McFadden's Adj R2: 0.236 Maximum Likelihood R2: 0.319 Cragg & Uhler's R2: 0.425 McKelvey and Zavoina's R2: 0.445 Efron's R2: 0.326 Variance of y*: 5.924 Variance of error: 3.290 Count R2: 0.736 Adj Count R2: 0.472 AIC: 1.059 AIC*n: 112.232 BIC: -374.102 BIC': -31.389
We will be comparing these results to the ones obtained in the next model.
One way to try to model the matched nature of the data is to include the pairs as a predictor in the model. We can do this in Stata using xi.
xi: logit honcomp read math i.pid i.pid _Ipid_1-53 (naturally coded; _Ipid_1 omitted) Logit estimates Number of obs = 106 LR chi2(54) = 78.53 Prob > chi2 = 0.0163 Log likelihood = -34.207961 Pseudo R2 = 0.5344 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1316183 .0568454 2.32 0.021 .0202034 .2430332 math | .3156888 .0719129 4.39 0.000 .1747421 .4566354 _Ipid_2 | 2.644259 5.685677 0.47 0.642 -8.499464 13.78798 _Ipid_3 | 1.683836 4.469618 0.38 0.706 -7.076455 10.44413 _Ipid_4 | 5.393423 4.898562 1.10 0.271 -4.207581 14.99443 _Ipid_5 | 4.144025 4.445536 0.93 0.351 -4.569065 12.85711 _Ipid_6 | 2.657129 7.54067 0.35 0.725 -12.12231 17.43657 _Ipid_7 | 8.353859 4.923587 1.70 0.090 -1.296195 18.00391 _Ipid_8 | -.0262261 4.422338 -0.00 0.995 -8.69385 8.641398 _Ipid_9 | .0524522 11.19484 0.00 0.996 -21.88904 21.99394 _Ipid_10 | 2.512641 4.419353 0.57 0.570 -6.149132 11.17441 _Ipid_11 | -2.328571 4.713814 -0.49 0.621 -11.56748 6.910335 _Ipid_12 | 1.763002 6.856181 0.26 0.797 -11.67487 15.20087 _Ipid_13 | 3.31522 4.873365 0.68 0.496 -6.236401 12.86684 _Ipid_14 | .8946142 4.386668 0.20 0.838 -7.703097 9.492325 _Ipid_15 | 4.117799 5.281313 0.78 0.436 -6.233384 14.46898 _Ipid_16 | -1.89462 4.478988 -0.42 0.672 -10.67328 6.884034 _Ipid_17 | 4.406774 4.535259 0.97 0.331 -4.482171 13.29572 _Ipid_18 | 1.815454 10.13918 0.18 0.858 -18.05698 21.68789 _Ipid_19 | 4.867926 5.604981 0.87 0.385 -6.117635 15.85349 _Ipid_20 | 4.722463 4.589177 1.03 0.303 -4.272158 13.71708 _Ipid_21 | 7.182652 6.930279 1.04 0.300 -6.400445 20.76575 _Ipid_22 | 1.867907 34.54054 0.05 0.957 -65.83031 69.56613 _Ipid_23 | 3.275149 7.036681 0.47 0.642 -10.51649 17.06679 _Ipid_24 | 2.735807 5.601202 0.49 0.625 -8.242348 13.71396 _Ipid_25 | 4.775403 5.583181 0.86 0.392 -6.16743 15.71824 _Ipid_26 | -.7239006 8.063399 -0.09 0.928 -16.52787 15.08007 _Ipid_27 | 1.315207 4.674924 0.28 0.778 -7.847475 10.47789 _Ipid_28 | 2.986174 9.776721 0.31 0.760 -16.17585 22.14819 _Ipid_29 | -.1321061 5.459525 -0.02 0.981 -10.83258 10.56837 _Ipid_30 | -1.868394 4.728345 -0.40 0.693 -11.13578 7.398992 _Ipid_31 | .5522115 5.293526 0.10 0.917 -9.822909 10.92733 _Ipid_32 | 1.697193 6.284521 0.27 0.787 -10.62024 14.01463 _Ipid_33 | -.5393423 4.470319 -0.12 0.904 -9.301006 8.222322 _Ipid_34 | .553187 6.815574 0.08 0.935 -12.80509 13.91147 _Ipid_35 | 4.195989 6.9446 0.60 0.546 -9.415177 17.80716 _Ipid_36 | -.5660562 4.463285 -0.13 0.899 -9.313933 8.181821 _Ipid_37 | -.5526993 4.747311 -0.12 0.907 -9.857259 8.75186 _Ipid_38 | 2.196952 4.396783 0.50 0.617 -6.420583 10.81449 _Ipid_39 | -2.079179 5.531016 -0.38 0.707 -12.91977 8.761413 _Ipid_40 | .5522115 4.561023 0.12 0.904 -8.387229 9.491652 _Ipid_41 | 2.986174 6.129965 0.49 0.626 -9.028337 15.00069 _Ipid_42 | -.5917945 8.294294 -0.07 0.943 -16.84831 15.66472 _Ipid_43 | -.0791661 4.652938 -0.02 0.986 -9.198757 9.040425 _Ipid_44 | 1.341433 4.602993 0.29 0.771 -7.680268 10.36313 _Ipid_45 | 1.775384 4.513903 0.39 0.694 -7.071705 10.62247 _Ipid_46 | -.2632366 4.443416 -0.06 0.953 -8.972173 8.445699 _Ipid_47 | 1.920847 4.393384 0.44 0.662 -6.690027 10.53172 _Ipid_48 | 1.35479 6.284713 0.22 0.829 -10.96302 13.6726 _Ipid_49 | 7.787803 4.635252 1.68 0.093 -1.297125 16.87273 _Ipid_50 | -.3824735 4.679609 -0.08 0.935 -9.554338 8.789391 _Ipid_51 | -3.262768 4.439724 -0.73 0.462 -11.96447 5.438932 _Ipid_52 | 1.736776 4.428688 0.39 0.695 -6.943294 10.41685 _Ipid_53 | 1.512635 5.285563 0.29 0.775 -8.846879 11.87215 _cons | -26.75926 6.473695 -4.13 0.000 -39.44747 -14.07105 ------------------------------------------------------------------------------
This time the degrees of freedom for the likelihood ratio chi-square is 54 and both read and math are statistically significant. These degrees of freedom are large relative to the number of observation (106). In this situation, there is concern that the estimates of the coefficients might not be unbiased. This is a case for the use of conditional logistic regression.
We will follow this model with the fitstat command.
fitstat Measures of Fit for logit of honcomp Log-Lik Intercept Only: -73.474 Log-Lik Full Model: -34.208 D(51): 68.416 LR(54): 78.531 Prob > LR: 0.016 McFadden's R2: 0.534 McFadden's Adj R2: -0.214 Maximum Likelihood R2: 0.523 Cragg & Uhler's R2: 0.698 McKelvey and Zavoina's R2: 0.737 Efron's R2: 0.600 Variance of y*: 12.493 Variance of error: 3.290 Count R2: 0.868 Adj Count R2: 0.736 AIC: 1.683 AIC*n: 178.416 BIC: -169.419 BIC': 173.294
Note for future reference that the deviance is 68.416 versus 106.232 for the previous model, AIC is 1.683 (vs 1.059) and BIC is -169.419 (vs -374.102). Only the AIC was better in the first model.
5.1.2 Conditional Logistic Regression
Next, we will estimate the model using Stata’s clogit command for conditional logistic regression.
clogit honcomp read math, group(pid) Conditional (fixed-effects) logistic regression Number of obs = 106 LR chi2(2) = 39.27 Prob > chi2 = 0.0000 Log likelihood = -17.10398 Pseudo R2 = 0.5344 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0658091 .0401959 1.64 0.102 -.0129735 .1445918 math | .1578444 .0508504 3.10 0.002 .0581794 .2575094 ------------------------------------------------------------------------------
As you can see the degrees of freedom are back down to two, math is significant while read is not. To view the results as odds ratios we can either run listcoef in which the column labeled e^b are the odds ratios or rerun clogit with the or option.
listcoef clogit (N=106): Factor Change in Odds Odds of: 1 vs 0 -------------------------------------------------- honcomp | b z P>|z| e^b -------------+------------------------------------ read | 0.06581 1.637 0.102 1.0680 math | 0.15784 3.104 0.002 1.1710 -------------------------------------------------- clogit, or Conditional (fixed-effects) logistic regression Number of obs = 106 LR chi2(2) = 39.27 Prob > chi2 = 0.0000 Log likelihood = -17.10398 Pseudo R2 = 0.5344 ------------------------------------------------------------------------------ honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.068023 .0429302 1.64 0.102 .9871103 1.155568 math | 1.170984 .059545 3.10 0.002 1.059905 1.293704 ------------------------------------------------------------------------------
We can follow up on this estimation with the fitstat command.
. fitstat Measures of Fit for clogit of honcomp Log-Lik Intercept Only: -36.737 Log-Lik Full Model: -17.104 D(51): 34.208 LR(2): 39.266 Prob > LR: 0.000 McFadden's R2: 0.534 McFadden's Adj R2: 0.480 Maximum Likelihood R2: 0.523 Cragg & Uhler's R2: 0.698 Count R2: 0.868 AIC: 0.721 AIC*n: 38.208 BIC: -168.277 BIC': -31.325
In comparing this fitstat to the previous one, we see that the deviance is smaller (34.208 vs 68.416), the AIC is smaller (0.721 vs 1.683) and that the BIC is slightly smaller (-168.277 vs -169.419) suggesting that the conditional logistic estimation approach is probably better.
The bottom line is that the coefficients, odds ratios and fit statistics can be interpreted in the same way as for ordinary logistic regression.
5.1.3 1:k Matching
The previous example used 1:1 matching. We will now use an example of 1:k matching, specifically 1:2 matching. There will be two non-honors composition students matched with each honors student. Actually, we artificially generated the data for the second matching student so that we could show you that the analysis proceeds in the same manner as the previous one.
The dataset hsbcl2 has our made up example matched on gender (female) and program type (prog).
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsbcl2 describe Contains data from hsbcl2.dta obs: 159 highschool & beyond vars: 7 26 Jun 2001 15:08 size: 5,088 (99.2% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl prog float %9.0g sel type of program read float %9.0g reading score math float %9.0g math score honcomp float %9.0g pid float %9.0g -------------------------------------------------------------------------------
We will begin with an ordinary logistic regression, once again, ignoring the matching. We will also obtain the odds ratios and the fit statistics.
logit honcomp read math Logit estimates Number of obs = 147 LR chi2(2) = 55.69 Prob > chi2 = 0.0000 Log likelihood = -68.253137 Pseudo R2 = 0.2898 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .077983 .0310376 2.51 0.012 .0171504 .1388156 math | .1209197 .0330634 3.66 0.000 .0561166 .1857228 _cons | -11.72204 1.983719 -5.91 0.000 -15.61006 -7.834021 ------------------------------------------------------------------------------ logit, or Logit estimates Number of obs = 147 LR chi2(2) = 55.69 Prob > chi2 = 0.0000 Log likelihood = -68.253137 Pseudo R2 = 0.2898 ------------------------------------------------------------------------------ honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.081104 .0335549 2.51 0.012 1.017298 1.148912 math | 1.128534 .0373132 3.66 0.000 1.057721 1.204088 ------------------------------------------------------------------------------ fitstat Measures of Fit for logit of honcomp Log-Lik Intercept Only: -96.098 Log-Lik Full Model: -68.253 D(144): 136.506 LR(2): 55.691 Prob > LR: 0.000 McFadden's R2: 0.290 McFadden's Adj R2: 0.259 Maximum Likelihood R2: 0.315 Cragg & Uhler's R2: 0.432 McKelvey and Zavoina's R2: 0.464 Efron's R2: 0.331 Variance of y*: 6.144 Variance of error: 3.290 Count R2: 0.741 Adj Count R2: 0.283 AIC: 0.969 AIC*n: 142.506 BIC: -582.116 BIC': -45.710
Now for the conditional logistic regression.
clogit honcomp read math, group(pid) Conditional (fixed-effects) logistic regression Number of obs = 147 LR chi2(2) = 59.54 Prob > chi2 = 0.0000 Log likelihood = -23.590129 Pseudo R2 = 0.5579 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0917047 .0426049 2.15 0.031 .0082006 .1752088 math | .1957582 .0571106 3.43 0.001 .0838235 .3076929 ------------------------------------------------------------------------------ clogit, or Conditional (fixed-effects) logistic regression Number of obs = 147 LR chi2(2) = 59.54 Prob > chi2 = 0.0000 Log likelihood = -23.590129 Pseudo R2 = 0.5579 ------------------------------------------------------------------------------ honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.096041 .0466967 2.15 0.031 1.008234 1.191495 math | 1.216233 .0694598 3.43 0.001 1.087437 1.360283 ------------------------------------------------------------------------------ fitstat Measures of Fit for clogit of honcomp Log-Lik Intercept Only: -53.361 Log-Lik Full Model: -23.590 D(51): 47.180 LR(2): 59.541 Prob > LR: 0.000 McFadden's R2: 0.558 McFadden's Adj R2: 0.520 Maximum Likelihood R2: 0.675 Cragg & Uhler's R2: 0.779 Count R2: 0.868 AIC: 0.966 AIC*n: 51.180 BIC: -155.305 BIC': -51.601
5.1.4 Conditional Logistic Regression using xtlogit
5.3 Probit Analysis
Logistic regression utilizing the logit transformation is not the only method for dealing with binary response variables. Probit regression analysis provides an alternative method. The differences between logistic and probit regression. The graph below displays both the logistic and probit probabilities for a sample with 200 observations.
As you can see, the two curves don’t really differ by very much. Since the two approaches are so similar, how should users decide to use one versus the other? Some disciplines have historically used probit for their data analyses. However, logistic regression does have several small advantages: 1) The exponentiated form of the coefficient is meaningful and interpretable as the odds ratio. This is not the case for probit coefficients. 2) At the present time, logistic regression have more tools for diagnostics and evaluation of models. With these points in mind, let’s work through an example.
5.3.1 Probit Regression Example
We will use the hsblog dataset in which honcomp, the response variable indicates whether students are in honors composition or not. We will begin with a logistic analysis.
use https://stats.idre.ucla.edu/stat/stata/webbooks/logistic/hsblog logit honcomp read female Logit estimates Number of obs = 200 LR chi2(2) = 60.40 Prob > chi2 = 0.0000 Log likelihood = -85.44372 Pseudo R2 = 0.2612 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1443657 .0233337 6.19 0.000 .0986325 .1900989 female | 1.120926 .4081028 2.75 0.006 .321059 1.920793 _cons | -9.603365 1.426404 -6.73 0.000 -12.39906 -6.807665 ------------------------------------------------------------------------------ predict p1 logit, or Logit estimates Number of obs = 200 LR chi2(2) = 60.40 Prob > chi2 = 0.0000 Log likelihood = -85.44372 Pseudo R2 = 0.2612 ------------------------------------------------------------------------------ honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.155307 .0269576 6.19 0.000 1.103661 1.209369 female | 3.067693 1.251934 2.75 0.006 1.378587 6.826368 ------------------------------------------------------------------------------ listcoef logit (N=200): Factor Change in Odds Odds of: 1 vs 0 ---------------------------------------------------------------------- honcomp | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- read | 0.14437 6.187 0.000 1.1553 4.3937 10.2529 female | 1.12093 2.747 0.006 3.0677 1.7500 0.4992 ----------------------------------------------------------------------
Both read (p-value .000) and female (p-value 0.006) are statistically significant. Now let’s try the equivalent analysis using the probit approach. Note, there is no equivalent to the logit, or option.
probit honcomp read female Probit estimates Number of obs = 200 LR chi2(2) = 61.31 Prob > chi2 = 0.0000 Log likelihood = -84.990569 Pseudo R2 = 0.2651 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0856048 .0130065 6.58 0.000 .0601126 .1110971 female | .6340312 .2300876 2.76 0.006 .1830678 1.084995 _cons | -5.672047 .7798022 -7.27 0.000 -7.200431 -4.143663 ------------------------------------------------------------------------------ predict p2 listcoef probit (N=200): Unstandardized and Standardized Estimates Observed SD: .4424407 Latent SD: 1.3568531 ------------------------------------------------------------------------------- honcomp | b z P>|z| bStdX bStdY bStdXY SDofX -------------+----------------------------------------------------------------- read | 0.08560 6.582 0.000 0.8777 0.0631 0.6469 10.2529 female | 0.63403 2.756 0.006 0.3165 0.4673 0.2333 0.4992 -------------------------------------------------------------------------------
Note that although the coefficients themselves are different the p-values associated with them are, in this instance, the same. We can see how close the predicted probabilities are by comparing the first 30 predicted probabilities. Remember, p1 was obtained from logit and p2 from probit.
. list p1 p2 in 1/30 p1 p2 1. .2018907 .2140139 2. .7915725 .7832198 3. .0372812 .0283618 4. .3755863 .3901446 5. .0563498 .0496128 6. .0372812 .0283618 7. .0843176 .0819907 8. .0090587 .002877 9. .3755863 .3901446 10. .2018907 .2140139 11. .2806132 .2960634 12. .2018907 .2140139 13. .7181559 .7180662 14. .1409265 .1470003 15. .0428232 .0343925 16. .0281952 .0189172 17. .0563498 .0496128 18. .2018907 .2140139 19. .5531741 .5592556 20. .1593262 .1675779 21. .3755863 .3901446 22. .3755863 .3901446 23. .0843176 .0819907 24. .2806132 .2960634 25. .0139005 .0061283 26. .0090587 .002877 27. .4453212 .457104 28. .0563498 .0496128 29. .0372812 .0283618 30. .1094523 .1111197
For the most part, the predicted probabilities a very close to one another.
5.4 Complementary Log Log Estimation
5.4.1 cloglog Example
We will stick with using the hsblog dataset. Let’s look at the same model that we used in the previous section, predicting honcomp using read and female.
. cloglog honcomp read female Complementary log-log regression Number of obs = 200 Zero outcomes = 147 Nonzero outcomes = 53 LR chi2(2) = 59.34 Log likelihood = -85.976021 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1094953 .0161081 6.80 0.000 .077924 .1410666 female | .8820051 .3065146 2.88 0.004 .2812476 1.482763 _cons | -7.783623 1.029657 -7.56 0.000 -9.801713 -5.765533 ------------------------------------------------------------------------------ predict p3 listcoef cloglog (N=200): Unstandardized and Standardized Estimates Observed SD: .4424407 ------------------------------------------------------------- honcomp | b z P>|z| bStdX SDofX -------------+----------------------------------------------- read | 0.10950 6.798 0.000 1.1226 10.2529 female | 0.88201 2.878 0.004 0.4403 0.4992 -------------------------------------------------------------
Again, the coefficients are somewhat different from the previous analyses but the p-values are pretty close. We will take another look at the predicted probabilities.
. list p1 p2 p3 in 1/30 p1 p2 p3 1. .2018907 .2140139 .19254605 2. .7915725 .7832198 .82146906 3. .0372812 .0283618 .05021318 4. .3755863 .3901446 .33803425 5. .0563498 .0496128 .06905129 6. .0372812 .0283618 .05021318 7. .0843176 .0819907 .09459662 8. .0090587 .002877 .01708784 9. .3755863 .3901446 .33803425 10. .2018907 .2140139 .19254605 11. .2806132 .2960634 .25698217 12. .2018907 .2140139 .19254605 13. .7181559 .7180662 .70861328 14. .1409265 .1470003 .14271825 15. .0428232 .0343925 .05585836 16. .0281952 .0189172 .040541 17. .0563498 .0496128 .06905129 18. .2018907 .2140139 .19254605 19. .5531741 .5592556 .5099445 20. .1593262 .1675779 .15785867 21. .3755863 .3901446 .33803425 22. .3755863 .3901446 .33803425 23. .0843176 .0819907 .09459662 24. .2806132 .2960634 .25698217 25. .0139005 .0061283 .02365356 26. .0090587 .002877 .01708784 27. .4453212 .457104 .40162623 28. .0563498 .0496128 .06905129 29. .0372812 .0283618 .05021318 30. .1094523 .1111197 .11635828
Again, many of the probabilities are very close to one another.
Let’s try another example. This time we need to create a variable scholar in which every student that scores 70 or higher on the math test is eligible for a partial scholarship. Only about 5% of the students qualify for eligibility.
generate scholar = math >= 70 tabualte scholar scholar | Freq. Percent Cum. ------------+----------------------------------- 0 | 189 94.50 94.50 1 | 11 5.50 100.00 ------------+----------------------------------- Total | 200 100.00 In this model we will try to predict scholar using read and science.
cloglog scholar read science Complementary log-log regression Number of obs = 200 Zero outcomes = 189 Nonzero outcomes = 11 LR chi2(2) = 31.83 Log likelihood = -26.679923 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ scholar | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1118547 .044388 2.52 0.012 .0248557 .1988537 science | .1445449 .0537736 2.69 0.007 .0391506 .2499392 _cons | -18.18081 3.801846 -4.78 0.000 -25.63229 -10.72933 ------------------------------------------------------------------------------ listcoef cloglog (N=200): Unstandardized and Standardized Estimates Observed SD: .22855236 ------------------------------------------------------------- scholar | b z P>|z| bStdX SDofX -------------+----------------------------------------------- read | 0.11185 2.520 0.012 1.1468 10.2529 science | 0.14454 2.688 0.007 1.4311 9.9009 -------------------------------------------------------------