This chapter makes extensive use of the fitstat program, which is not part of base Stata. Prior to using the fitstat command, they need to be downloaded by typing search fitstat in the command line (see How can I use the search command to search for programs and get additional help? for more information about using search).
Figure 5.2, page 105. Using crab data set.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a egen wmean = mean(width), by(a) * Stata 8 code. egen ssatell = sum(y), by(a) egen sn = sum(n), by(a) * Stata 9 code. egen ssatell = total(y), by(a) egen sn = total(n), by(a) gen prop_s = ssatell/sn graph twoway (lowess prop_s wmean) (scatter prop_s wmean) /// (scatter y width , mlab(marker) msymbol(none) legend(off))
logit y width, nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4972306 .1017361 4.89 0.000 .2978316 .6966297 _cons | -12.35082 2.628731 -4.70 0.000 -17.50304 -7.1986 ------------------------------------------------------------------------------
Table 5.1 on page 106.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a quietly logit y width, nolog predict p collapse (mean) width p (sum) y p_count=p n , by(a) gen prop = y/n list +---------------------------------------------------------+ | a width p y p_count n prop | |---------------------------------------------------------| 1. | 1 22.69286 .2596734 5 3.635427 14 .3571429 | 2. | 2 23.84286 .3789991 4 5.305987 14 .2857143 | 3. | 3 24.775 .492058 17 13.77762 28 .6071429 | 4. | 4 25.83846 .6212226 21 24.22768 39 .5384616 | 5. | 5 26.79091 .7244455 15 15.9378 22 .6818182 | |---------------------------------------------------------| 6. | 6 27.7375 .8076395 20 19.38335 24 .8333333 | 7. | 7 28.66667 .8694543 15 15.65018 18 .8333333 | 8. | 8 30.40714 .9344253 14 13.08195 14 1 | +---------------------------------------------------------+
Linear model approach on page 106.
reg y width Source | SS df MS Number of obs = 173 -------------+------------------------------ F( 1, 171) = 32.85 Model | 6.40974521 1 6.40974521 Prob > F = 0.0000 Residual | 33.3706016 171 .195149717 R-squared = 0.1611 -------------+------------------------------ Adj R-squared = 0.1562 Total | 39.7803468 172 .231281086 Root MSE = .44176 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .0915308 .0159709 5.73 0.000 .0600052 .1230563 _cons | -1.765534 .4213581 -4.19 0.000 -2.597267 -.9338014 ------------------------------------------------------------------------------
Back to logit model on page 107 and Figure 5.1, page 104.
quietly logit y width predict p tablist width p, sort(v) +-------------------------+ | width p Freq | |-------------------------| | 21 .129096 1 | | 22 .195959 1 | | 22.5 .2380991 3 | | 22.9 .2760306 3 | | 23 .286077 2 | |-------------------------| | 23.1 .2963393 3 | | 23.2 .3068116 1 | | 23.4 .3283577 1 | | 23.5 .3394157 1 | | 23.7 .3620558 3 | |-------------------------| | 23.8 .3736171 3 | | 23.9 .3853249 1 | | 24 .3971669 2 | | 24.1 .4091306 1 | | 24.2 .4212029 2 | |-------------------------| | 24.3 .4333699 2 | | 24.5 .4579326 7 | | 24.7 .4827014 5 | | 24.8 .4951253 1 | | 24.9 .5075554 3 | |-------------------------| | 25 .5199761 6 | | 25.1 .5323722 2 | | 25.2 .5447285 2 | | 25.3 .5570297 1 | | 25.4 .5692616 3 | |-------------------------| | 25.5 .5814095 3 | | 25.6 .5934595 2 | | 25.7 .6053981 6 | | 25.8 .6172119 7 | | 25.9 .6288891 1 | |-------------------------| | 26 .6404177 6 | | 26.1 .6517864 2 | | 26.2 .6629848 8 | | 26.3 .674003 1 | | 26.5 .6954646 6 | |-------------------------| | 26.7 .7161084 3 | | 26.8 .7261074 3 | | 27 .7454343 5 | | 27.1 .7547542 2 | | 27.2 .763841 2 | |-------------------------| | 27.3 .7726924 1 | | 27.4 .7813072 3 | | 27.5 .7896843 6 | | 27.6 .7978235 1 | | 27.7 .8057253 2 | |-------------------------| | 27.8 .8133904 2 | | 27.9 .8208204 2 | | 28 .8280171 3 | | 28.2 .8417205 4 | | 28.3 .8482328 3 | |-------------------------| | 28.4 .8545237 2 | | 28.5 .8605966 4 | | 28.7 .8721051 2 | | 28.9 .8827927 1 | | 29 .8878404 6 | |-------------------------| | 29.3 .9018577 2 | | 29.5 .9103148 1 | | 29.7 .9181093 1 | | 29.8 .9217708 1 | | 30 .9286477 3 | |-------------------------| | 30.2 .9349627 1 | | 30.3 .9379216 1 | | 30.5 .9434658 1 | | 31.7 .9680587 1 | | 31.9 .9709946 1 | |-------------------------| | 33.5 .9866974 1 | +-------------------------+ sum p Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p | 173 .6416185 .1980444 .129096 .9866974 graph twoway line p width, ytitle("Probability") xlabel(20(2)34) sort
Section 5.1.3, page 107-108. Odds ratio interpretation.
Note: You may have to download the program prvalue from the internet. It belongs a suite of programs written by J. Scott Long and Jeremy Freese for post estimation (see How can I use the search command to search for programs and get additional help? for more information about using search).
logit y width, or nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | 1.644162 .1672706 4.89 0.000 1.346935 2.006977 ------------------------------------------------------------------------------ prvalue, x(width=26.3) logit: Predictions for y Pr(y=1|x): 0.6740 95% ci: (0.5915,0.7470) Pr(y=0|x): 0.3260 95% ci: (0.2530,0.4085) width x= 26.3 di .6740/.3260 2.0674847 prvalue, x(width=27.3) logit: Predictions for y Pr(y=1|x): 0.7727 95% ci: (0.6830,0.8428) Pr(y=0|x): 0.2273 95% ci: (0.1572,0.3170) width x= 27.3 di .7727/.2273 3.3994721 di 3.3994721/2.0674847 1.644255
Section 5.2.1, page 109. Confidence intervals for effects.
logit y width, nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4972306 .1017361 4.89 0.000 .2978316 .6966297 _cons | -12.35082 2.628731 -4.70 0.000 -17.50304 -7.1986 ------------------------------------------------------------------------------ logit y width, or nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | 1.644162 .1672706 4.89 0.000 1.346935 2.006977 ------------------------------------------------------------------------------
Section 5.3. Model checking.
gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 (2 real changes made) replace a = 8 if a >8 (5 real changes made) sort a logit satell width, nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ satell | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4972306 .1017361 4.89 0.000 .2978316 .6966297 _cons | -12.35082 2.628731 -4.70 0.000 -17.50304 -7.1986 ------------------------------------------------------------------------------ predict p (option p assumed; Pr(satell)) gen no=1-y gen nop = 1-p collapse (sum) yes=y no p nop, by(a) list +------------------------------------+ | a yes no p nop | |------------------------------------| 1. | 1 5 9 3.635427 10.36457 | 2. | 2 4 10 5.305987 8.694013 | 3. | 3 17 11 13.77762 14.22238 | 4. | 4 21 18 24.22768 14.77232 | 5. | 5 15 7 15.9378 6.0622 | |------------------------------------| 6. | 6 20 4 19.38335 4.616651 | 7. | 7 15 3 15.65018 2.349822 | 8. | 8 14 0 13.08195 .9180457 | +------------------------------------+ gen x2 = (yes-p)^2/p + (no-nop)^2/nop * Stata 8 code. egen x2sum = sum(x2) * Stata 9 code. egen x2sum = total(x2) gen g2 = 2*yes*log(yes/p) + 2*no*log(no/nop) (1 missing value generated) replace g2 = 2 if yes==0 | no==0 (1 real change made) * Stata 8 code. egen g2sum=sum(g2) * Stata 9 code. egen g2sum=total(g2) list +------------------------------------------------------------------------------+ | a yes no p nop x2 x2sum g2 g2sum | |------------------------------------------------------------------------------| 1. | 1 5 9 3.635427 10.36457 .6918539 5.3201 .6460713 6.280302 | 2. | 2 4 10 5.305987 8.694013 .5176301 5.3201 .5386781 6.280302 | 3. | 3 17 11 13.77762 14.22238 1.483761 5.3201 1.493428 6.280302 | 4. | 4 21 18 24.22768 14.77232 1.135233 5.3201 1.109317 6.280302 | 5. | 5 15 7 15.9378 6.0622 .2002557 5.3201 .1944201 6.280302 | |------------------------------------------------------------------------------| 6. | 6 20 4 19.38335 4.616651 .1019846 5.3201 .1057136 6.280302 | 7. | 7 15 3 15.65018 2.349822 .2069104 5.3201 .1926733 6.280302 | 8. | 8 14 0 13.08195 .9180457 .9824709 5.3201 2 6.280302 | +------------------------------------------------------------------------------+
A simpler approach described on page 113.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a egen mwidth = mean(width), by(a) logit y mwidth, nolog Logit estimates Number of obs = 173 LR chi2(1) = 28.08 Prob > chi2 = 0.0000 Log likelihood = -98.84003 Pseudo R2 = 0.1244 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mwidth | .4654004 .0986921 4.72 0.000 .2719674 .6588334 _cons | -11.53299 2.552684 -4.52 0.000 -16.53616 -6.529821 ------------------------------------------------------------------------------ * Stata 8 code lfit * Stata 9 code and output. estat gof Logistic model for y, goodness-of-fit test number of observations = 173 number of covariate patterns = 8 Pearson chi2(6) = 5.02 Prob > chi2 = 0.5417
Model on ungrouped data:
logit y width, nolog Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4972306 .1017361 4.89 0.000 .2978316 .6966297 _cons | -12.35082 2.628731 -4.70 0.000 -17.50304 -7.1986 ------------------------------------------------------------------------------ * Stata 8 code. lfit, group(10) table * Stata 9 code and output. estat gof, group(10) table Logistic model for y, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) +--------------------------------------------------------+ | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | |-------+--------+-------+-------+-------+-------+-------| | 1 | 0.3621 | 5 | 5.4 | 14 | 13.6 | 19 | | 2 | 0.4579 | 8 | 7.6 | 10 | 10.4 | 18 | | 3 | 0.5200 | 10 | 7.6 | 5 | 7.4 | 15 | | 4 | 0.6054 | 9 | 11.0 | 10 | 8.0 | 19 | | 5 | 0.6518 | 11 | 10.1 | 5 | 5.9 | 16 | |-------+--------+-------+-------+-------+-------+-------| | 6 | 0.7161 | 11 | 12.3 | 7 | 5.7 | 18 | | 7 | 0.7897 | 16 | 16.8 | 6 | 5.2 | 22 | | 8 | 0.8417 | 12 | 11.5 | 2 | 2.5 | 14 | | 9 | 0.8878 | 15 | 15.7 | 3 | 2.3 | 18 | | 10 | 0.9867 | 14 | 13.1 | 0 | 0.9 | 14 | +--------------------------------------------------------+ number of observations = 173 number of groups = 10 Hosmer-Lemeshow chi2(8) = 4.63 Prob > chi2 = 0.7963
Section 5.3.2, page 114-115. Goodness of fit and likelihood-ratio model comparison tests:
logit y mwidth, nolog Logit estimates Number of obs = 173 LR chi2(1) = 28.08 Prob > chi2 = 0.0000 Log likelihood = -98.84003 Pseudo R2 = 0.1244 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mwidth | .4654004 .0986921 4.72 0.000 .2719674 .6588334 _cons | -11.53299 2.552684 -4.52 0.000 -16.53616 -6.529821 ------------------------------------------------------------------------------ fitstat Measures of Fit for logit of y Log-Lik Intercept Only: -112.879 Log-Lik Full Model: -98.840 D(171): 197.680 LR(1): 28.078 Prob > LR: 0.000 McFadden's R2: 0.124 McFadden's Adj R2: 0.107 Maximum Likelihood R2: 0.150 Cragg & Uhler's R2: 0.206 McKelvey and Zavoina's R2: 0.219 Efron's R2: 0.145 Variance of y*: 4.212 Variance of error: 3.290 Count R2: 0.665 Adj Count R2: 0.065 AIC: 1.166 AIC*n: 201.680 BIC: -683.533 BIC': -22.925
Section 5.3.3 on residuals for logit models.
Table 5.3, page 116.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a logit y predict pind egen mwidth = mean(width), by(a) logit y mwidth, nolog predict p predict r, residuals predict h, hat gen aresid = r/sqrt(1-h) collapse (mean) mwidth r aresid pi=pind (sum) y p pind (count) n, by(a) gen rr= (y-pi*n)/sqrt(n*pi*(1-pi)) list mwidth n y pind rr p r aresid +------------------------------------------------------------------------------+ | mwidth n y pind rr p r aresid | |------------------------------------------------------------------------------| 1. | 22.69286 14 5 8.982659 -2.219718 3.843518 .6925753 .8564039 | 2. | 23.84286 14 4 8.982659 -2.777064 5.496007 -.8187712 -.9297187 | 3. | 24.775 28 17 17.96532 -.3804346 13.98114 1.141024 1.344962 | 4. | 25.83846 39 21 25.02312 -1.343444 24.20473 -1.057578 -1.240055 | 5. | 26.79091 22 15 14.11561 .3932084 15.80022 -.3792292 -.4173211 | |------------------------------------------------------------------------------| 6. | 27.7375 24 20 15.39884 1.95862 19.16056 .4270666 .4948038 | 7. | 28.66667 18 15 11.54913 1.696214 15.46522 -.3152464 -.3611885 | 8. | 30.40714 14 14 8.982659 2.796394 13.0486 1.010328 1.136103 | +------------------------------------------------------------------------------+
Figure 5.3, page 116.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a egen mwidth = mean(width), by(a) logit y mwidth, nolog predict p collapse (mean) mwidth phat=p (sum) y p (count) n, by(a) gen obp=y/n graph twoway (scatter obp mwidth) (scatter phat mwidth, connect(l)), /// ylabel(0(.2)1) xlabel(22(2)32) ytitle("proportion")
Section 5.3.4 on diagnostic measures of influence.
Table 5.4 on page 118. For the model with the variable width as a predictor, we will use ungrouped data because it is easier to generate all the diagnostic statistics using the logit command. For the model with no predictors, we will have to group the data and use the glm command. Some further calculation is needed for creating the diagnostic statistics. The details are shown below.
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear gen a = ceil(width - 23.25) + 1 replace a = 1 if a<=0 replace a = 8 if a >8 sort a egen mwidth = mean(width), by(a) logit y mwidth, nolog predict db, db predict dx, dx predict dd, dd collapse (mean) width db dd dx (sum) y n , by(a) glm y, fam(bin n) Generalized linear models No. of obs = 8 Optimization : ML: Newton-Raphson Residual df = 7 Scale parameter = 1 Deviance = 34.03404409 (1/df) Deviance = 4.862006 Pearson = 29.27657443 (1/df) Pearson = 4.182368 Variance function: V(u) = u*(1-u/n) [Binomial] Link function : g(u) = ln(u/(n-u)) [Logit] Standard errors : OIM Log likelihood = -28.60784483 AIC = 7.401961 BIC = 19.4779533 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | .5823958 .1585498 3.67 0.000 .2716439 .8931477 ------------------------------------------------------------------------------ predict din, d predict h2, h predict res, p gen x2=res^2/(1-h2) gen din2=din^2/(1-h2) drop din h2 res list width db dx dd x2 din2 +-----------------------------------------------------------------+ | width db dx dd x2 din2 | |-----------------------------------------------------------------| 1. | 22.69286 .3880239 .7334276 .6949906 5.360987 5.06951 | 2. | 23.84286 .2501259 .8643769 .9014844 8.391136 7.966363 | 3. | 24.775 .7044131 1.808922 1.822847 .1726785 .1704266 | 4. | 25.83846 .5764279 1.537736 1.503042 2.330132 2.253074 | 5. | 26.79091 .0367436 .1741569 .1699482 .1771392 .1803587 | |-----------------------------------------------------------------| 6. | 27.7375 .0838247 .2448309 .2565225 4.454101 5.030672 | 7. | 28.66667 .0407948 .1304572 .1243544 3.211263 3.626952 | 8. | 30.40714 .3413671 1.29073 2.491689 8.508358 13.51937 | +-----------------------------------------------------------------+
Section 5.4 Logit Models for Qualitative Predictors
Table 5.5 on page 119 and model (5.4.1).
use https://stats.idre.ucla.edu/stat/stata/examples/icda/azt, clear list +----------------------------+ | race azt symp count | |----------------------------| 1. | white yes yes 14 | 2. | white yes no 93 | 3. | white no yes 32 | 4. | white no no 81 | 5. | black yes yes 11 | |----------------------------| 6. | black yes no 52 | 7. | black no yes 12 | 8. | black no no 43 | +----------------------------+ logit symp race azt [fw=count], nolog Logit estimates Number of obs = 338 LR chi2(2) = 6.97 Prob > chi2 = 0.0307 Log likelihood = -167.57559 Pseudo R2 = 0.0204 ------------------------------------------------------------------------------ symp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | .0554845 .2886132 0.19 0.848 -.5101869 .621156 azt | -.7194599 .2789791 -2.58 0.010 -1.266249 -.1726709 _cons | -1.073574 .2629407 -4.08 0.000 -1.588928 -.5582193 ------------------------------------------------------------------------------ test azt ( 1) azt = 0 chi2( 1) = 6.65 Prob > chi2 = 0.0099 * Stata 8 code. lfit * Stata 9 code and output. estat gof Logistic model for symp, goodness-of-fit test number of observations = 338 number of covariate patterns = 4 Pearson chi2(1) = 1.39 Prob > chi2 = 0.2382
Table 5.6 on page 121. We make use of the xi3 command written by Michael Mitchell. The command xi3 is a generalization of Stata’s command xi. It allows 3 way interactions and performs additional coding schemes beyond indicator coding. You can download the xi3 program from the internet within Stata by issuing search xi3 command and then following the link (see How can I use the search command to search for programs and get additional help? for more information about using search).
xi3: logit symp i.race i.azt [fw=count], nolog i.race _Irace_0-1 (naturally coded; _Irace_0 omitted) i.azt _Iazt_0-1 (naturally coded; _Iazt_0 omitted) Logit estimates Number of obs = 338 LR chi2(2) = 6.97 Prob > chi2 = 0.0307 Log likelihood = -167.57559 Pseudo R2 = 0.0204 ------------------------------------------------------------------------------ symp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irace_1 | .0554845 .2886132 0.19 0.848 -.5101869 .621156 _Iazt_1 | -.7194599 .2789791 -2.58 0.010 -1.266249 -.1726709 _cons | -1.073574 .2629407 -4.08 0.000 -1.588928 -.5582193 ------------------------------------------------------------------------------ char azt[omit] 1 char race[omit] 1 xi3: logit symp i.race i.azt [fw=count], nolog i.race _Irace_0-1 (naturally coded; _Irace_1 omitted) i.azt _Iazt_0-1 (naturally coded; _Iazt_1 omitted) Logit estimates Number of obs = 338 LR chi2(2) = 6.97 Prob > chi2 = 0.0307 Log likelihood = -167.57559 Pseudo R2 = 0.0204 ------------------------------------------------------------------------------ symp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irace_0 | -.0554845 .2886132 -0.19 0.848 -.621156 .5101869 _Iazt_0 | .7194599 .2789791 2.58 0.010 .1726709 1.266249 _cons | -1.737549 .2403847 -7.23 0.000 -2.208694 -1.266404 ------------------------------------------------------------------------------ xi3: logit symp e.race e.azt [fw=count], nolog e.race _Irace_0-1 (naturally coded; _Irace_0 omitted) e.azt _Iazt_0-1 (naturally coded; _Iazt_0 omitted) Logit estimates Number of obs = 338 LR chi2(2) = 6.97 Prob > chi2 = 0.0307 Log likelihood = -167.57559 Pseudo R2 = 0.0204 ------------------------------------------------------------------------------ symp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irace_1 | .0277423 .1443066 0.19 0.848 -.2550935 .310578 _Iazt_1 | -.35973 .1394895 -2.58 0.010 -.6331244 -.0863355 _cons | -1.405561 .1466849 -9.58 0.000 -1.693059 -1.118064 ------------------------------------------------------------------------------
Section 5.5.1, page 122-124. Horseshoe crab example using color and width predictors
use https://stats.idre.ucla.edu/stat/stata/examples/icda/crab, clear char color[omit] 4 xi3: logit y i.color width i.color _Icolor_1-4 (naturally coded; _Icolor_4 omitted) Logit estimates Number of obs = 173 LR chi2(4) = 38.30 Prob > chi2 = 0.0000 Log likelihood = -93.728515 Pseudo R2 = 0.1697 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icolor_1 | 1.329919 .8525264 1.56 0.119 -.3410018 3.00084 _Icolor_2 | 1.402336 .5484409 2.56 0.011 .3274116 2.477261 _Icolor_3 | 1.106121 .5920835 1.87 0.062 -.0543408 2.266584 width | .467956 .1055464 4.43 0.000 .2610889 .6748231 _cons | -12.71511 2.761775 -4.60 0.000 -18.12809 -7.302133 ------------------------------------------------------------------------------ prvalue , x(_Icolor_1=1 _Icolor_2=0 _Icolor_3=0) logit: Predictions for y Pr(y=1|x): 0.7153 95% ci: (0.3916,0.9075) Pr(y=0|x): 0.2847 95% ci: (0.0925,0.6084) _Icolor_1 _Icolor_2 _Icolor_3 width x= 1 0 0 26.298844
Figure 5.4 on page 124. This graph can be easily produced using the Stata program postgr3 written by Michael Mitchell. You can download the program through the internet (see How can I use the search command to search for programs and get additional help? for more information about using search).
postgr3 width, by(color) ytitle(" ")
Section 5.5.2, page 124-125. Model comparison.
logit y width Logit estimates Number of obs = 173 LR chi2(1) = 31.31 Prob > chi2 = 0.0000 Log likelihood = -97.226331 Pseudo R2 = 0.1387 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4972306 .1017361 4.89 0.000 .2978316 .6966297 _cons | -12.35082 2.628731 -4.70 0.000 -17.50304 -7.1986 ------------------------------------------------------------------------------ lrtest, saving(m0) xi3: logit y width i.color i.color _Icolor_1-4 (naturally coded; _Icolor_4 omitted) Logit estimates Number of obs = 173 LR chi2(4) = 38.30 Prob > chi2 = 0.0000 Log likelihood = -93.728515 Pseudo R2 = 0.1697 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .467956 .1055464 4.43 0.000 .2610889 .6748231 _Icolor_1 | 1.329919 .8525264 1.56 0.119 -.3410018 3.00084 _Icolor_2 | 1.402336 .5484409 2.56 0.011 .3274116 2.477261 _Icolor_3 | 1.106121 .5920835 1.87 0.062 -.0543408 2.266584 _cons | -12.71511 2.761775 -4.60 0.000 -18.12809 -7.302133 ------------------------------------------------------------------------------ lrtest, using(m0) likelihood-ratio test LR chi2(3) = 7.00 (Assumption: LRTEST_m0 nested in .) Prob > chi2 = 0.0720
Section 5.5.3, page 125-126. Quantitative treatment of ordinal predictor.
logit y width color Logit estimates Number of obs = 173 LR chi2(2) = 36.64 Prob > chi2 = 0.0000 Log likelihood = -94.560587 Pseudo R2 = 0.1623 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .4583098 .1040194 4.41 0.000 .2544355 .662184 color | -.5090467 .2236827 -2.28 0.023 -.9474568 -.0706366 _cons | -10.07084 2.806862 -3.59 0.000 -15.57219 -4.569491 ------------------------------------------------------------------------------ fitstat, saving(m0) Measures of Fit for logit of y Log-Lik Intercept Only: -112.879 Log-Lik Full Model: -94.561 D(170): 189.121 LR(2): 36.637 Prob > LR: 0.000 McFadden's R2: 0.162 McFadden's Adj R2: 0.136 Maximum Likelihood R2: 0.191 Cragg & Uhler's R2: 0.262 McKelvey and Zavoina's R2: 0.285 Efron's R2: 0.198 Variance of y*: 4.599 Variance of error: 3.290 Count R2: 0.728 Adj Count R2: 0.242 AIC: 1.128 AIC*n: 195.121 BIC: -686.938 BIC': -26.331 (Indices saved in matrix fs_m0) xi3: logit y width i.color i.color _Icolor_1-4 (naturally coded; _Icolor_4 omitted) Logit estimates Number of obs = 173 LR chi2(4) = 38.30 Prob > chi2 = 0.0000 Log likelihood = -93.728515 Pseudo R2 = 0.1697 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .467956 .1055464 4.43 0.000 .2610889 .6748231 _Icolor_1 | 1.329919 .8525264 1.56 0.119 -.3410018 3.00084 _Icolor_2 | 1.402336 .5484409 2.56 0.011 .3274116 2.477261 _Icolor_3 | 1.106121 .5920835 1.87 0.062 -.0543408 2.266584 _cons | -12.71511 2.761775 -4.60 0.000 -18.12809 -7.302133 ------------------------------------------------------------------------------ fitstat , using(m0) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -93.729 -94.561 0.832 D: 187.457(168) 189.121(170) 1.664(2) LR: 38.301(4) 36.637(2) 1.664(2) Prob > LR: 0.000 0.000 0.435 McFadden's R2: 0.170 0.162 0.007 McFadden's Adj R2: 0.125 0.136 -0.010 Maximum Likelihood R2: 0.199 0.191 0.008 Cragg & Uhler's R2: 0.272 0.262 0.011 McKelvey and Zavoina's R2: 0.297 0.285 0.012 Efron's R2: 0.204 0.198 0.007 Variance of y*: 4.677 4.599 0.078 Variance of error: 3.290 3.290 0.000 Count R2: 0.734 0.728 0.006 Adj Count R2: 0.258 0.242 0.016 AIC: 1.141 1.128 0.014 AIC*n: 197.457 195.121 2.336 BIC: -678.296 -686.938 8.642 BIC': -17.688 -26.331 8.642 Difference of 8.642 in BIC' provides strong support for saved model. Note: p-value for difference in LR is only valid if models are nested.
Section 5.5.4, page 126-127. Model selection with several predictors
xi3: logit y width i.color i.spine weight i.color _Icolor_1-4 (naturally coded; _Icolor_4 omitted) i.spine _Ispine_1-3 (naturally coded; _Ispine_1 omitted) Logit estimates Number of obs = 173 LR chi2(7) = 40.56 Prob > chi2 = 0.0000 Log likelihood = -92.600999 Pseudo R2 = 0.1796 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- width | .263128 .1953012 1.35 0.178 -.1196553 .6459114 _Icolor_1 | 1.608666 .9355408 1.72 0.086 -.2249604 3.442292 _Icolor_2 | 1.505763 .5666724 2.66 0.008 .3951059 2.616421 _Icolor_3 | 1.119802 .593296 1.89 0.059 -.0430372 2.28264 _Ispine_2 | -.0959809 .7033755 -0.14 0.891 -1.474571 1.28261 _Ispine_3 | .4002868 .502712 0.80 0.426 -.5850106 1.385584 weight | .82578 .7038361 1.17 0.241 -.5537134 2.205273 _cons | -9.673681 3.86463 -2.50 0.012 -17.24822 -2.099145 ------------------------------------------------------------------------------
Section 5.5.5, page 128. Backward elimination of predictors. We will use Stata command fitstat after each model to show the deviance, the degrees of freedom, the difference of deviance between models and correlation. By definition, Efron’s R2 is simply the squared correlation.
Model 1:
quietly xi3: logit y i.color*i.spine*width fitstat, saving(m1) Measures of Fit for logit of y Log-Lik Intercept Only: -111.848 Log-Lik Full Model: -85.220 D(152): 170.440 LR(19): 53.255 Prob > LR: 0.000 McFadden's R2: 0.238 McFadden's Adj R2: 0.059 Maximum Likelihood R2: 0.266 Cragg & Uhler's R2: 0.366 McKelvey and Zavoina's R2: 0.973 Efron's R2: 0.269 Variance of y*: 122.792 Variance of error: 3.290 Count R2: 0.756 Adj Count R2: 0.311 AIC: 1.223 AIC*n: 210.440 BIC: -611.979 BIC': 44.547 (Indices saved in matrix fs_m1) di sqrt(.269) 5186521
Model 2:
quietly xi3: logit y i.c*i.spine i.c*width i.spine*width fitstat, using(m1) saving(m2) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 172 172 0 Log-Lik Intercept Only: -111.848 -111.848 0.000 Log-Lik Full Model: -86.837 -85.220 -1.617 D: 173.674(155) 170.440(152) 3.233(3) LR: 50.022(16) 53.255(19) 3.233(3) Prob > LR: 0.000 0.000 0.357 McFadden's R2: 0.224 0.238 -0.014 McFadden's Adj R2: 0.072 0.059 0.012 Maximum Likelihood R2: 0.252 0.266 -0.014 Cragg & Uhler's R2: 0.347 0.366 -0.019 McKelvey and Zavoina's R2: 0.824 0.973 -0.149 Efron's R2: 0.256 0.269 -0.013 Variance of y*: 18.712 122.792 -104.080 Variance of error: 3.290 3.290 0.000 Count R2: 0.762 0.756 0.006 Adj Count R2: 0.328 0.311 0.016 AIC: 1.207 1.223 -0.016 AIC*n: 207.674 210.440 -2.767 BIC: -624.188 -611.979 -12.209 BIC': 32.338 44.547 -12.209 Difference of 12.209 in BIC' provides very strong support for current model. Note: p-value for difference in LR is only valid if models are nested. (Indices saved in matrix fs_m2)
Model 3a:
quietly xi3: logit y i.c*i.spine i.spine*width fitstat, using(m2) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 172 172 0 Log-Lik Intercept Only: -111.848 -111.848 0.000 Log-Lik Full Model: -88.668 -86.837 -1.831 D: 177.336(158) 173.674(155) 3.662(3) LR: 46.360(13) 50.022(16) 3.662(3) Prob > LR: 0.000 0.000 0.300 McFadden's R2: 0.207 0.224 -0.016 McFadden's Adj R2: 0.082 0.072 0.010 Maximum Likelihood R2: 0.236 0.252 -0.016 Cragg & Uhler's R2: 0.325 0.347 -0.022 McKelvey and Zavoina's R2: 0.816 0.824 -0.008 Efron's R2: 0.241 0.256 -0.015 Variance of y*: 17.879 18.712 -0.834 Variance of error: 3.290 3.290 0.000 Count R2: 0.733 0.762 -0.029 Adj Count R2: 0.246 0.328 -0.082 AIC: 1.194 1.207 -0.014 AIC*n: 205.336 207.674 -2.338 BIC: -635.968 -624.188 -11.780 BIC': 20.557 32.338 -11.780 Difference of 11.780 in BIC' provides very strong support for current model. Note: p-value for difference in LR is only valid if models are nested.
Model 3b:
quietly xi3: logit y i.c*width i.spine*width fitstat, using(m2) force Measures of Fit for logit of y Warning: N's do not match. Current Saved Difference Model: logit logit N: 173 172 1 Log-Lik Intercept Only: -112.879 -111.848 -1.031 Log-Lik Full Model: -90.779 -86.837 -3.943 D: 181.559(161) 173.674(155) 7.885(6) LR: 44.200(11) 50.022(16) 5.822(5) Prob > LR: 0.000 0.000 0.324 McFadden's R2: 0.196 0.224 -0.028 McFadden's Adj R2: 0.089 0.072 0.018 Maximum Likelihood R2: 0.225 0.252 -0.027 Cragg & Uhler's R2: 0.309 0.347 -0.037 McKelvey and Zavoina's R2: 0.326 0.824 -0.498 Efron's R2: 0.231 0.256 -0.025 Variance of y*: 4.881 18.712 -13.832 Variance of error: 3.290 3.290 0.000 Count R2: 0.746 0.762 -0.016 Adj Count R2: 0.290 0.328 -0.038 AIC: 1.188 1.207 -0.019 AIC*n: 205.559 207.674 -2.115 BIC: -648.121 -624.188 -23.933 BIC': 12.487 32.338 -19.851 Note: p-value for difference in LR is only valid if models are nested.
Model 3c:
quietly xi3: logit y i.c*i.spine i.c*width fitstat, using(m2) saving(m3c) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 172 172 0 Log-Lik Intercept Only: -111.848 -111.848 0.000 Log-Lik Full Model: -86.838 -86.837 -0.001 D: 173.676(157) 173.674(155) 0.003(2) LR: 50.019(14) 50.022(16) 0.003(2) Prob > LR: 0.000 0.000 0.999 McFadden's R2: 0.224 0.224 -0.000 McFadden's Adj R2: 0.089 0.072 0.018 Maximum Likelihood R2: 0.252 0.252 -0.000 Cragg & Uhler's R2: 0.347 0.347 -0.000 McKelvey and Zavoina's R2: 0.821 0.824 -0.003 Efron's R2: 0.256 0.256 -0.000 Variance of y*: 18.394 18.712 -0.318 Variance of error: 3.290 3.290 0.000 Count R2: 0.762 0.762 0.000 Adj Count R2: 0.328 0.328 0.000 AIC: 1.184 1.207 -0.023 AIC*n: 203.676 207.674 -3.997 BIC: -634.480 -624.188 -10.292 BIC': 22.046 32.338 -10.292 Difference of 10.292 in BIC' provides very strong support for current model. Note: p-value for difference in LR is only valid if models are nested. (Indices saved in matrix fs_m3c)
Model 4a:
quietly xi3: logit y i.spine i.c*width fitstat, using(m3c) force Measures of Fit for logit of y Warning: N's do not match. Current Saved Difference Model: logit logit N: 173 172 1 Log-Lik Intercept Only: -112.879 -111.848 -1.031 Log-Lik Full Model: -90.819 -86.838 -3.980 D: 181.637(163) 173.676(157) 7.961(6) LR: 44.122(9) 50.019(14) 5.898(5) Prob > LR: 0.000 0.000 0.316 McFadden's R2: 0.195 0.224 -0.028 McFadden's Adj R2: 0.107 0.089 0.017 Maximum Likelihood R2: 0.225 0.252 -0.027 Cragg & Uhler's R2: 0.309 0.347 -0.038 McKelvey and Zavoina's R2: 0.323 0.821 -0.498 Efron's R2: 0.231 0.256 -0.025 Variance of y*: 4.863 18.394 -13.531 Variance of error: 3.290 3.290 0.000 Count R2: 0.740 0.762 -0.022 Adj Count R2: 0.274 0.328 -0.054 AIC: 1.166 1.184 -0.019 AIC*n: 201.637 203.676 -2.039 BIC: -658.350 -634.480 -23.869 BIC': 2.258 22.046 -19.787 Note: p-value for difference in LR is only valid if models are nested.
Model 4b:
quietly xi3: logit y width i.c*i.spine fitstat, using(m3c) saving(m4b) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 172 172 0 Log-Lik Intercept Only: -111.848 -111.848 0.000 Log-Lik Full Model: -88.798 -86.838 -1.960 D: 177.597(160) 173.676(157) 3.920(3) LR: 46.099(11) 50.019(14) 3.920(3) Prob > LR: 0.000 0.000 0.270 McFadden's R2: 0.206 0.224 -0.018 McFadden's Adj R2: 0.099 0.089 0.009 Maximum Likelihood R2: 0.235 0.252 -0.017 Cragg & Uhler's R2: 0.323 0.347 -0.024 McKelvey and Zavoina's R2: 0.822 0.821 0.001 Efron's R2: 0.240 0.256 -0.016 Variance of y*: 18.485 18.394 0.091 Variance of error: 3.290 3.290 0.000 Count R2: 0.738 0.762 -0.023 Adj Count R2: 0.262 0.328 -0.066 AIC: 1.172 1.184 -0.012 AIC*n: 201.597 203.676 -2.080 BIC: -646.002 -634.480 -11.522 BIC': 10.523 22.046 -11.522 Difference of 11.522 in BIC' provides very strong support for current model. Note: p-value for difference in LR is only valid if models are nested. (Indices saved in matrix fs_m4b)
Model 5:
quietly xi3: logit y i.color i.spine width fitstat, using(m4c) saving(m5) force Measures of Fit for logit of y Warning: N's do not match. Current Saved Difference Model: logit logit N: 173 172 1 Log-Lik Intercept Only: -112.879 -111.848 -1.031 Log-Lik Full Model: -93.306 -88.798 -4.508 D: 186.612(166) 177.597(160) 9.015(6) LR: 39.147(6) 46.099(11) 6.953(5) Prob > LR: 0.000 0.000 0.224 McFadden's R2: 0.173 0.206 -0.033 McFadden's Adj R2: 0.111 0.099 0.013 Maximum Likelihood R2: 0.203 0.235 -0.033 Cragg & Uhler's R2: 0.278 0.323 -0.045 McKelvey and Zavoina's R2: 0.298 0.822 -0.524 Efron's R2: 0.208 0.240 -0.032 Variance of y*: 4.689 18.485 -13.796 Variance of error: 3.290 3.290 0.000 Count R2: 0.740 0.738 0.002 Adj Count R2: 0.274 0.262 0.012 AIC: 1.160 1.172 -0.012 AIC*n: 200.612 201.597 -0.985 BIC: -668.835 -646.002 -22.832 BIC': -8.227 10.523 -18.750 Note: p-value for difference in LR is only valid if models are nested. (Indices saved in matrix fs_m5)
Model 6a:
quietly xi3: logit y i.color i.spine fitstat, using(m5) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -104.417 -93.306 -11.111 D: 208.834(167) 186.612(166) 22.222(1) LR: 16.925(5) 39.147(6) 22.222(1) Prob > LR: 0.005 0.000 0.000 McFadden's R2: 0.075 0.173 -0.098 McFadden's Adj R2: 0.022 0.111 -0.090 Maximum Likelihood R2: 0.093 0.203 -0.109 Cragg & Uhler's R2: 0.128 0.278 -0.150 McKelvey and Zavoina's R2: 0.118 0.298 -0.180 Efron's R2: 0.099 0.208 -0.109 Variance of y*: 3.731 4.689 -0.958 Variance of error: 3.290 3.290 0.000 Count R2: 0.688 0.740 -0.052 Adj Count R2: 0.129 0.274 -0.145 AIC: 1.276 1.160 0.117 AIC*n: 220.834 200.612 20.222 BIC: -651.766 -668.835 17.069 BIC': 8.842 -8.227 17.069 Difference of 17.069 in BIC' provides very strong support for saved model. Note: p-value for difference in LR is only valid if models are nested.
Model 6b:
quietly xi3: logit y i.spine width fitstat, using(m5) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -97.212 -93.306 -3.906 D: 194.425(169) 186.612(166) 7.813(3) LR: 31.334(3) 39.147(6) 7.813(3) Prob > LR: 0.000 0.000 0.050 McFadden's R2: 0.139 0.173 -0.035 McFadden's Adj R2: 0.103 0.111 -0.008 Maximum Likelihood R2: 0.166 0.203 -0.037 Cragg & Uhler's R2: 0.227 0.278 -0.051 McKelvey and Zavoina's R2: 0.250 0.298 -0.048 Efron's R2: 0.161 0.208 -0.046 Variance of y*: 4.386 4.689 -0.303 Variance of error: 3.290 3.290 0.000 Count R2: 0.705 0.740 -0.035 Adj Count R2: 0.177 0.274 -0.097 AIC: 1.170 1.160 0.010 AIC*n: 202.425 200.612 1.813 BIC: -676.481 -668.835 -7.647 BIC': -15.874 -8.227 -7.647 Difference of 7.647 in BIC' provides strong support for current model. Note: p-value for difference in LR is only valid if models are nested.
Model 6c:
quietly xi3: logit y i.color width fitstat, using(m5) saving(m6c) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -93.729 -93.306 -0.423 D: 187.457(168) 186.612(166) 0.845(2) LR: 38.301(4) 39.147(6) 0.845(2) Prob > LR: 0.000 0.000 0.655 McFadden's R2: 0.170 0.173 -0.004 McFadden's Adj R2: 0.125 0.111 0.014 Maximum Likelihood R2: 0.199 0.203 -0.004 Cragg & Uhler's R2: 0.272 0.278 -0.005 McKelvey and Zavoina's R2: 0.297 0.298 -0.002 Efron's R2: 0.204 0.208 -0.003 Variance of y*: 4.677 4.689 -0.011 Variance of error: 3.290 3.290 0.000 Count R2: 0.734 0.740 -0.006 Adj Count R2: 0.258 0.274 -0.016 AIC: 1.141 1.160 -0.018 AIC*n: 197.457 200.612 -3.155 BIC: -678.296 -668.835 -9.461 BIC': -17.688 -8.227 -9.461 Difference of 9.461 in BIC' provides strong support for current model. Note: p-value for difference in LR is only valid if models are nested. (Indices saved in matrix fs_m6c) di sqrt(.204) .45166359
Model 7a:
quietly xi3: logit y i.color fitstat, using(m6c) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -106.030 -93.729 -12.302 D: 212.061(169) 187.457(168) 24.604(1) LR: 13.698(3) 38.301(4) 24.604(1) Prob > LR: 0.003 0.000 0.000 McFadden's R2: 0.061 0.170 -0.109 McFadden's Adj R2: 0.025 0.125 -0.100 Maximum Likelihood R2: 0.076 0.199 -0.122 Cragg & Uhler's R2: 0.104 0.272 -0.168 McKelvey and Zavoina's R2: 0.095 0.297 -0.201 Efron's R2: 0.081 0.204 -0.123 Variance of y*: 3.636 4.677 -1.041 Variance of error: 3.290 3.290 0.000 Count R2: 0.688 0.734 -0.046 Adj Count R2: 0.129 0.258 -0.129 AIC: 1.272 1.141 0.131 AIC*n: 220.061 197.457 22.604 BIC: -658.845 -678.296 19.451 BIC': 1.762 -17.688 19.451 Difference of 19.451 in BIC' provides very strong support for saved model. Note: p-value for difference in LR is only valid if models are nested. di sqrt(.081) .28460499
Model 7b:
quietly logit y width fitstat, using(m6c) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -97.226 -93.729 -3.498 D: 194.453(171) 187.457(168) 6.996(3) LR: 31.306(1) 38.301(4) 6.996(3) Prob > LR: 0.000 0.000 0.072 McFadden's R2: 0.139 0.170 -0.031 McFadden's Adj R2: 0.121 0.125 -0.004 Maximum Likelihood R2: 0.166 0.199 -0.033 Cragg & Uhler's R2: 0.227 0.272 -0.045 McKelvey and Zavoina's R2: 0.251 0.297 -0.046 Efron's R2: 0.161 0.204 -0.043 Variance of y*: 4.390 4.677 -0.288 Variance of error: 3.290 3.290 0.000 Count R2: 0.705 0.734 -0.029 Adj Count R2: 0.177 0.258 -0.081 AIC: 1.147 1.141 0.006 AIC*n: 198.453 197.457 0.996 BIC: -686.760 -678.296 -8.464 BIC': -26.153 -17.688 -8.464 Difference of 8.464 in BIC' provides strong support for current model. Note: p-value for difference in LR is only valid if models are nested. di sqrt(.161) .40124805
Model 8:
gen cdark = color==4 quietly logit y width cdark fitstat, using(m6c) saving(m8) Measures of Fit for logit of y Current Saved Difference Model: logit logit N: 173 173 0 Log-Lik Intercept Only: -112.879 -112.879 0.000 Log-Lik Full Model: -93.979 -93.729 -0.250 D: 187.958(170) 187.457(168) 0.501(2) LR: 37.801(2) 38.301(4) 0.501(2) Prob > LR: 0.000 0.000 0.778 McFadden's R2: 0.167 0.170 -0.002 McFadden's Adj R2: 0.141 0.125 0.015 Maximum Likelihood R2: 0.196 0.199 -0.002 Cragg & Uhler's R2: 0.269 0.272 -0.003 McKelvey and Zavoina's R2: 0.294 0.297 -0.003 Efron's R2: 0.200 0.204 -0.005 Variance of y*: 4.658 4.677 -0.020 Variance of error: 3.290 3.290 0.000 Count R2: 0.728 0.734 -0.006 Adj Count R2: 0.242 0.258 -0.016 AIC: 1.121 1.141 -0.020 AIC*n: 193.958 197.457 -3.499 BIC: -688.102 -678.296 -9.806 BIC': -27.494 -17.688 -9.806 Difference of 9.806 in BIC' provides strong support for current model. Note: p-value for difference in LR is only valid if models are nested.
di sqrt(.200) .4472136
Model 9:
quietly glm y, fam(bin) di e(deviance) - 187.96 37.798523
Section 5.6.1, page 130. Sample Size for Comparing Two Proportions
We also showed Stata command sampsi which yields similar answer.
di (invnorm(.975)+invnorm(.9))^2*(.2*.8+.3*.7)/(.2-.3)^2 388.77465 sampsi .2 .3 Estimated sample size for two-sample comparison of proportions Test Ho: p1 = p2, where p1 is the proportion in population 1 and p2 is the proportion in population 2 Assumptions: alpha = 0.0500 (two-sided) power = 0.9000 p1 = 0.2000 p2 = 0.3000 n2/n1 = 1.00 Estimated required sample sizes: n1 = 412 n2 = 412