This page has been updated to Stata 15.1.
Introduction
The purpose of this seminar is to help you increase your skills in using logistic regression analysis with Stata. The seminar does not teach logistic regression, per se, but focuses on how to perform logistic regression analyses and interpret the results using Stata. It is assumed that you are familiar with ordinary least squares regression and logistic regression (e.g., have had a class in logistic regression or have read about logistic regression, see our Statistics Books for Loan for books you can borrow on logistic regression). Of course, we will not be discussing all aspects of logistic regression. This workshop will focus mostly on interpreting the output in these different metrics, rather than on other aspects of the analysis, such as model building, model diagnostics, receiver-operator curves, sensitivity and specificity.
Community-contributed commands
In addition to the built-in Stata commands we will be demonstrating the use of a number on community-contributed (AKA user-written) ado-files, in particular, listcoef and fitstat. To find out more about these programs or to download them type search followed by the program name in the Stata command window (example: search listcoef). These add-on programs ease the running and interpretation of ordinal logistic models. You can also download the complete spostado package by typing the following in the Stata command window:
net from http://www.indiana.edu/~jslsoc/stata/ net install spost9_ado
Although this is a presentation about logistic regression, we are going to start by talking about ordinary least squares regression (OLS) briefly. Probably the best way to learn about logistic regression is to get a good foundation in OLS regression, because most things in OLS regression are easy. Also, almost everything that you know about predictor variables in OLS regression (the variables on the right-hand side) is the same in logistic regression, expect with respect to certain types of interaction terms, which we will discuss toward the end of this workshop. The difference between OLS regression and logistic regression is, of course, the dependent variable: In OLS regression, the dependent (also known as the outcome) variable is continuous, while in logistic regression it is binary. Please note that when we speak of logistic regression, we really mean binary logistic regression, as opposed to ordinal logistic regression or multinomial logistic regression. A binary variable refers to a variable that is coded as 0, 1 or missing; it cannot take on any value other than those three.
Theoretical treatments of the topic of logistic regression (both binary and ordinal logistic regression) assume that there is an unobserved, or latent, continuous outcome variable. However, we are able to observe only two states: 0 and 1. In other words, lower values on the latent continuous variable are observed as 0, which higher values on the latent continuous variable are observed as 1. A point called a threshold (or cutoff) separates the regions of the latent variable that are observed as 0 and 1. While this explanation helps to make logistic regression seem a little more like OLS regression, in a practical sense, it isn’t much help.
Consequences of a binary outcome variable
Because we observe 0s and 1s (and perhaps missing values) for the outcome variable in a logistic regression, let’s talk about the consequences of having such a variable as the outcome variable. First of all, let’s remember that we are modeling the 1s, which usually means success; 0 usually means failure. There are at least two critical consequences of having a binary outcome variable. The first is that it requires an increased sample size. We will discuss the reasons for this later, but for now, keep in mind that logistic regression requires a much larger sample size than OLS regression. Another important consequence is that we can no longer use an identity link to link our outcome variable with our predictors. Instead, we will need to use a logit link. This link allows for a linear relationship between the outcome and the predictors; hence the phrase “linear in the logit.” This means that the coefficients are no longer in the original metric of the variable, as they are in OLS regression. Instead, the raw coefficients are in the metric of log odds. These log odds (also known as the log of the odds) can be exponeniated to give an odds ratio. The results can also be converted into predicted probabilities. So now there are at least three metrics in which the results can be discussed. Each has its own set of pros and cons.
Before continuing on, let’s visit FAQ: How do I interpret odds ratios in logistic regression? for a quick refresher on the relationship between probability, odds and log odds.
The example data set
For the examples in this workshop, we will use the hsbdemo dataset with the binary response variable honors (enrolled in an honors English program). Other variables that will be used in example analyses will be read, which is the score on a reading test; science, which is the score on a science test; socst, which is the score on a social studies test; female, which indicates if the student is female (1 = female; 0 = male); and prog, which is the type of program in which the student is enrolled (1 = general; 2 = academic; 3 = vocational). Let’s get the dataset into Stata.
use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear
The describe command gives basic information about variables in the dataset.
describe honors read science socst female prog storage display value variable name type format label variable label --------------------------------------------------------------------------------------------------------------------------------------------------------- honors float %19.0g honlab honors english read float %9.0g reading score science float %9.0g science score socst float %9.0g social studies score female float %9.0g fl prog float %9.0g sel type of program
In the output above, we see that all of the variables are numeric (storage type is float). We also see that all three categorical variables (honors, female and prog) have value labels. These will be shown in the output to make it more meaningful.
The summarize command (which can be shorted to sum) is used to see basic descriptive information on these variables.
summarize honors female read science socst Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- honors | 200 .265 .4424407 0 1 female | 200 .545 .4992205 0 1 read | 200 52.23 10.25294 28 76 science | 200 51.85 9.900891 26 74 socst | 200 52.405 10.73579 26 71 -------------+--------------------------------------------------------- prog | 200 2.025 .6904772 1 3
There are several important points to note in the output above. First, all of the variables have 200 observations, so we will not have issues with missing data. Secondly, as expected, the mean of honors is rather low because relatively few students are admitted to honors English. The mean of female is approximately 0.5, which means that approximately half of the students in this sample are female. The mean of the continuous variables read, science and socst are similar, as are the ranges for these variables.
Below are one-way tabulations of the three categorical variables. We can use the numlabel, add command to add the numeric value of each category to the descriptive label. That way, you can see both the numeric value and the descriptive label in the output.
numlabel, add tab1 honors female prog -> tabulation of honors honors english | Freq. Percent Cum. ----------------+----------------------------------- 0. not enrolled | 147 73.50 73.50 1. enrolled | 53 26.50 100.00 ----------------+----------------------------------- Total | 200 100.00 -> tabulation of female female | Freq. Percent Cum. ------------+----------------------------------- 0. male | 91 45.50 45.50 1. female | 109 54.50 100.00 ------------+----------------------------------- Total | 200 100.00 -> tabulation of prog type of | program | Freq. Percent Cum. ------------+----------------------------------- 1. general | 45 22.50 22.50 2. academic | 105 52.50 75.00 3. vocation | 50 25.00 100.00 ------------+----------------------------------- Total | 200 100.00
In our logistic regression model, the binary variable honors will be the outcome variable. It is important that the outcome variable in a binary logistic regression is coded as 0 and 1 (and missing, if there are missing values on that variable). In most statistical software programs, values greater than 1 will be considered to be 1, which may not be what you intend. (In such situations, an ordered logistic regression or a multinomial logistic regression may be more appropriate. Those types of logistic regression will not be covered in this presentation.) Remember that we will be modeling the 1s, which means the 1s category will be compared to the 0 category.
Another point to mention is distribution of the variable honors. It is distributed approximately 75 5 and 25%. This is not bad. In general, logistic regression will have the most power statistically when the outcome is distributed 50/50. Power will decrease as the distribution becomes more lopsided.
A quick note about running logistic regression in Stata. The output from the logit command will be in units of log odds. The or option can be added to get odds ratios. Alternatively, the logistic command can be used; the default output for the logistic command is odds ratios.
Models without interactions
A null model
Let’s start with a null model, which is a model without any predictor variables. In an equation, we are modeling
logit(p)= ß0
logit honors Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -115.64441 Logistic regression Number of obs = 200 LR chi2(0) = -0.00 Prob > chi2 = . Log likelihood = -115.64441 Pseudo R2 = -0.0000 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | -1.020141 .1602206 -6.37 0.000 -1.334167 -.706114 ------------------------------------------------------------------------------
This means log(p/(1-p)) = -1.020141. What is p here? It turns out that p is the overall probability of being in honors English (honors = 1). Let’s take a look at the frequency table for honors.
tab honors honors english | Freq. Percent Cum. ----------------+----------------------------------- 0. not enrolled | 147 73.50 73.50 1. enrolled | 53 26.50 100.00 ----------------+----------------------------------- Total | 200 100.00
So p = 53/200 = .265. The odds are .265/(1-.265) = .3605442 and the log of the odds (logit) is log(.3605442) = -1.020141. In other words, the intercept from the model with no predictor variables is the estimated log odds of being in honors English for the whole population of interest. We can get this value from Stata using the logistic command (or logit, or).
logistic honors Logistic regression Number of obs = 200 LR chi2(0) = -0.00 Prob > chi2 = . Log likelihood = -115.64441 Pseudo R2 = -0.0000 ------------------------------------------------------------------------------ honors | Odds Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | .3605442 .0577666 -6.37 0.000 .2633774 .4935584 ------------------------------------------------------------------------------
We can calculate the odds by hand based on the values from the frequency values in the table from above.
display 53/147 .36054422
We can also transform the log of the odds back to a probability: p = exp(-1.020141)/(1+exp(-1.020141)) = .26499994, if we like. We can have Stata calculate this value for us by using the margins command.
margins Warning: prediction constant over observations. Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | .265 .031207 8.49 0.000 .2038355 .3261645 ------------------------------------------------------------------------------
One binary predictor
Let’s add one binary predictor, female. We will use the logit, or command to get output in terms of odds ratios. We will then see how the odds ratio can be calculated by hand.
logit(p) = ß0 + ß1*female
logit honors female, or Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -113.68907 Iteration 2: log likelihood = -113.67691 Iteration 3: log likelihood = -113.6769 Logistic regression Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.918168 .6400451 1.95 0.051 .9973827 3.689024 _cons | .2465754 .0648892 -5.32 0.000 .1472129 .4130033 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.
In our dataset, what are the odds of a male being in honors English and what are the odds of a female being in the honors English? We can manually calculate these odds from the table: for males, the odds of being in the honors class are (18/91)/(73/91) = .24657534; and for females, the odds of being in the honors class are (35/109)/(74/109) = .47297297. The ratio of the odds for female to the odds for male is (73/18)/(74/35) = (73*35)/(74*18) = 1.9181682. So the odds for males are 18 to 73, the odds for females are 35 to 74, and the odds for female are about 92% higher than the odds for males.
logit honors female Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -113.68907 Iteration 2: log likelihood = -113.67691 Iteration 3: log likelihood = -113.6769 Logistic regression Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .6513706 .3336752 1.95 0.051 -.0026207 1.305362 _cons | -1.400088 .2631619 -5.32 0.000 -1.915875 -.8842998 ------------------------------------------------------------------------------
Now we can relate the odds for males and females and the output from the logistic regression. The intercept of -1.40 is the log odds for males because male is the reference group (female = 0). Using the odds we calculated above for males, we can confirm this: log(.2465754) = -1.400088. The coefficient for female is the log of odds ratio between the female group and male group: log(1.918168) = .65137056. So we can get the odds ratio by exponentiating the coefficient for female.
The odds ratio for the variable female is 1.918168. Let’s see how we could calculate this number from the crosstabulation of honors and female.
We can use the formula: (a/c)/(b/d) or, equivalently, a*d/b*c.
tab honors female, chi2 honors | female english | male female | Total -------------+----------------------+---------- not enrolled | 73 74 | 147 enrolled | 18 35 | 53 -------------+----------------------+---------- Total | 91 109 | 200 Pearson chi2(1) = 3.8710 Pr = 0.049
We have (male-not enrolled/male-enrolled)/(female-not enrolled/female-enrolled).
display (73/18)/(74/35) 1.9181682
Alternatively, we could use (male-not enrolled*female-enrolled)/(female-not enrolled*male-enrolled).
display (73*35)/(74*18) 1.9181682
Looking back at the crosstabulation above, notice that all of the cells have a reasonable number of observations in them. It is good practice to do a crosstab of the outcome variable and all of the categorical predictors before running a logistic regression to check for empty or sparse cells. The possible consequences of having empty cells or cells with very few observations include the model not converging or the confidence intervals being very wide.
Let’s review the interpretation of both the odds ratio and the raw coefficient of this model. We will rerun each model for clarity.
logit honors female, or Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -113.68907 Iteration 2: log likelihood = -113.67691 Iteration 3: log likelihood = -113.6769 Logistic regression Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.918168 .6400451 1.95 0.051 .9973827 3.689024 _cons | .2465754 .0648892 -5.32 0.000 .1472129 .4130033 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.
The general interpretation of an exponetiated logistic regression coefficient is this (Long and Freese, 2014, page 229): “For a unit change in xk, the odds are expected to change by a factor of exp(bk), holding all other variables constant.”
The interpretation of this odds ratio is that, for a one-unit increase in female (in other words, going from male to female), the odds of being enrolled in honors English increases by a factor of 1.9, holding all other variables constant.
One other thing to note about reporting odds ratios. When reporting odds ratios, you usually report the associated 95% confidence interval, rather than the standard error. This is because the odds ratio is a nonlinear transformation of the logit coefficient, so the confidence interval is asymmetric.
logit honors female Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -113.68907 Iteration 2: log likelihood = -113.67691 Iteration 3: log likelihood = -113.6769 Logistic regression Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .6513706 .3336752 1.95 0.051 -.0026207 1.305362 _cons | -1.400088 .2631619 -5.32 0.000 -1.915875 -.8842998 ------------------------------------------------------------------------------
The general interpretation of a logistic regression coefficient is this (Long and Freese, 2014, page 228): “For a unit change in xk, we expect the log of the odds of the outcome to change bk units, holding all other variables constant.”
For this example, we would say that for a one-unit increase in female (in other words, going from male to female), the expected log of the odds of being in honors English increases by 0.65, holding all other variables constant.
There are a couple of other points to discuss regarding the output from our first logistic regression. The model is given again below for ease of reference.
logit honors female, or Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -113.68907 Iteration 2: log likelihood = -113.67691 Iteration 3: log likelihood = -113.6769 Logistic regression Number of obs = 200 LR chi2(1) = 3.94 Prob > chi2 = 0.0473 Log likelihood = -113.6769 Pseudo R2 = 0.0170 ------------------------------------------------------------------------------ honors | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.918168 .6400451 1.95 0.051 .9973827 3.689024 _cons | .2465754 .0648892 -5.32 0.000 .1472129 .4130033 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.
First, while using the nolog option will shorten your output (by no displaying the iteration log) and is commonly used in examples, in real research, that part of the output can be an important source of information if there is a problem with your model. For example, sometimes logistic regression models don’t converge. If you don’t show the iteration log, you can’t see that problem.
Second, remember that logistic regression is a maximum likelihood procedure (you can see the log likelihood in the output). All maximum likelihood procedures require relatively large sample sizes because of the assumptions that they make. Also, the outcome variable in a logistic regression is binary, which means that it necessarily contains less information than other types of outcomes, such as a continuous outcome. Long (1997, page 54) states: “It is risky to use ML with samples smaller than 100, while sample over 500 seem adequate. These values should be raised depending on characteristics of the model and data.”
Third, let’s talk about the pseudo R-squared. The emphasis is the on the term “pseudo”. A pseudo R-squared is not comparable to the R-squared that you would get from an ordinary least squares regression. This means that you cannot interpret it as the percentage of variance in the outcome that is accounted for by the model. Rather, this value is good for comparing the relative fit of two models, but it says nothing about the absolute fit of the models. Stata reports McFadden’s pseudo R-squared, but there are several others. Please see FAQ: What are pseudo R-squareds? for more information.
Fourth, notice that the p-value for the overall model is statistically significant, while the p-value for the variable female is not (p = 0.051). Why are they not the same? After all, the variable female is the only predictor in the model. The answer is that the test of the overall model is a likelihood ratio chi-square, while the test of the coefficient is a Wald chi-square. Asymptotically, these two tests are equivalent. However, with smaller sample sizes, a difference can be seen. It is rare that one test would be statistically significant while the other is not. However, notice that the likelihood ratio test is “just barely” statistically significant, while the Wald chi-square is “just barely” not statistically significant. So, in reality, the results are not that different. Results like these should be interpreted with caution. Notice also that the p-value for the chi-square analysis above has a p-value of 0.049. This is a Pearson chi-square, which is also asymptotically equal to the other types of chi-square. As with the other p-values, this p-value is very close to the 0.05 cut off.
Interpreting negative coefficients
Let’s pause for a moment to make sure that we understand how to interpret a logistic regression coefficient that is negative. Let’s suppose that the logistic regression coefficient is -2. A negative coefficient means that the predictor variable has a negative relationship with the outcome variable: as one goes up, the other goes down. Using the standard interpretation, we could say that for a one-unit increase in the predictor, the log of the odds is expected to decrease by 2, holding all other variables constant. Now what about in the odds ratio metric? First,
exp(-2) = .13533528
Using the standard interpretation, we would say that the for a one-unit increase in the predictor, the odds are expected to decreases by a factor of .14, holding all other variables constant.
This doesn’t seem like a big change, but remember that odds ratios are multiplicative coefficients. Long and Freese (2014) write on page 223:
“When interpreting odds ratios, remember that they are multiplicative. This means that 1 indicates no effect, positive effects are greater than 1, and negative effects are between 0 and 1. Magnitudes of positive and negative effects should be compared by taking the inverse of the negative effect, or vice versa. For example, an odds ratio of 2 has the same magnitude as an odds ratio of 0.5 = 1/2. Thus an odds ratio of 0.1 = 1/10 is much “larger” than the odds ratio of 2 = 1/0.5. Another consequence of the multiplicative scale is that to determine the effect on the odds of the event not occurring, you simply take the inverse of the effect on the odds of the event occurring.”
Let’s look at a table of coefficients and odds ratios of equivalent magnitudes. It shows the effect of compressing all of the negative coefficients into odds ratios that range from 0 to 1.
Coefficient Positive Negative 0 1 1 1 2.7182818 .36787944 1.5 4.4816891 .22313016 2 7.3890561 .13533528 2.5 12.182494 .082085 3 20.085537 .04978707
One multi-categorical predictor
Now let’s use a different categorical predictor variable. The variable prog has three levels; the lowest-numbered category will be used as the reference group by default. In this dataset, that level is called “general”.
We will start with a crosstab.
tab honors prog | type of program honors english | 1. genera 2. academ 3. vocati | Total ----------------+---------------------------------+---------- 0. not enrolled | 38 65 44 | 147 1. enrolled | 7 40 6 | 53 ----------------+---------------------------------+---------- Total | 45 105 50 | 200
While there are large differences in the number of observations in each cell, the frequencies are probably large enough to avoid any real problems.
logit honors i.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -107.7993 Iteration 2: log likelihood = -107.57279 Iteration 3: log likelihood = -107.5719 Iteration 4: log likelihood = -107.5719 Logistic regression Number of obs = 200 LR chi2(2) = 16.15 Prob > chi2 = 0.0003 Log likelihood = -107.5719 Pseudo R2 = 0.0698 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | academic | 1.206168 .4577746 2.63 0.008 .3089465 2.10339 vocation | -.3007541 .5988045 -0.50 0.615 -1.474389 .8728812 | _cons | -1.691676 .4113064 -4.11 0.000 -2.497822 -.8855303 ------------------------------------------------------------------------------
In the output above, we can see that the overall model is statistically significant (p = 0.0003). In the output table, we can see that the “academic” level is statistically significantly different from “general”, while the “vocation” level is not.
We can use the contrast command to get the multi-degree-of-freedom test of the variable prog. This is a Wald chi-square test.
contrast prog Contrasts of marginal linear predictions Margins : asbalanced ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- prog | 2 14.26 0.0008 ------------------------------------------------
Again we see that the p-value for the overall model does not match that given for the variable prog, even though prog is the only predictor in the model. However, both tests lead to the same conclusion: the variable prog is a statistically significant predictor of honors. When writing about these results, you would say that the variable prog was a statistically significant predictor of the outcome variable honors, citing either the LR chi-square test or the Wald chi-square test, and that there was a statistically significant difference between the “academic” and “general” levels.
While that is important information to convey to your audience, you might want to include something a little more descriptive and potentially more practical. For this purpose, you can use the margins command. Used after a logistic regression, the margins command gives the average predicted probabilities of each group.
margins prog Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | general | .1555556 .0540284 2.88 0.004 .0496619 .2614493 academic | .3809524 .0473917 8.04 0.000 .2880663 .4738385 vocation | .12 .0459565 2.61 0.009 .0299269 .2100731 ------------------------------------------------------------------------------
The average predicted probability for the reference level, “general”, is 0.156. This isn’t too different from the average predicted probability for the “vocation” level, 0.12. However, the “academic” level has an average predicted probability of 0.38. This is very different from the average predicted probability of 0.156 of the reference level “general” and explains why that comparison is statistically significant. The term “average predicted probability” means that, for example, if everyone in the dataset was treated as if he/she was in the “general” level, then the predicted probability would be 0.156.
It is important to remember that the predicted probabilities will change as the model changes. For example, if another predictor is added to the model, the predicted probabilities for each level of prog will change. When other predictor variables are included in the model, it is important to set those to informative values (or at least note the value), because predicted probabilities are a non-linear metric, which means that the value of the predicted probability depends on the level at which other variables in the model are held. We will see an example of this a little later.
Also, the p-values in this table test the null hypothesis that the predicted probability is 0. For many purposes, this is an uninteresting test, and so this is ignored.
Let’s return to our model to review the interpretation of the output.
logit honors i.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -107.7993 Iteration 2: log likelihood = -107.57279 Iteration 3: log likelihood = -107.5719 Iteration 4: log likelihood = -107.5719 Logistic regression Number of obs = 200 LR chi2(2) = 16.15 Prob > chi2 = 0.0003 Log likelihood = -107.5719 Pseudo R2 = 0.0698 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | 2. academic | 1.206168 .4577746 2.63 0.008 .3089465 2.10339 3. vocation | -.3007541 .5988045 -0.50 0.615 -1.474389 .8728812 | _cons | -1.691676 .4113064 -4.11 0.000 -2.497822 -.8855303 ------------------------------------------------------------------------------ logit honors i.prog, or Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -107.7993 Iteration 2: log likelihood = -107.57279 Iteration 3: log likelihood = -107.5719 Iteration 4: log likelihood = -107.5719 Logistic regression Number of obs = 200 LR chi2(2) = 16.15 Prob > chi2 = 0.0003 Log likelihood = -107.5719 Pseudo R2 = 0.0698 ------------------------------------------------------------------------------ honors | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | 2. academic | 3.340659 1.529269 2.63 0.008 1.36199 8.193899 3. vocation | .7402597 .4432709 -0.50 0.615 .2289184 2.393798 | _cons | .1842105 .075767 -4.11 0.000 .082264 .4124954 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.
Assuming that the 2 df test of prog is statistically significant (it is), we can interpret the coefficient for “academic” as: Being in the academic program compared to the general program, the expected log of the odds increases by 1.2, holding all other variables constant.
Alternatively, we could say that being in the academic program compared to the general program increases the odds of being in honors English by a factor of 3.34, holding all other variables in the model constant.
Logistic regression models with many categorical predictors
We are not going to run any models with multiple categorical predictor variables, but let’s pretend that we were. Let’s say that we were going to include both female and prog in our model. There are at least two commands that can be used to do this three-way crosstab. One is the built-in (AKA native to Stata) command table. The other is a community-contributed (AKA user-written) command tablist. Of course, both give the same information; the difference is in the way the information is presented.
table honors female prog -------------------------------------------------------------------------------------- | type of program and female | ---- 1. general ---- ---- 2. academic --- ---- 3. vocation --- honors english | 0. male 1. female 0. male 1. female 0. male 1. female ----------------+--------------------------------------------------------------------- 0. not enrolled | 19 19 32 33 22 22 1. enrolled | 2 5 15 25 1 5 -------------------------------------------------------------------------------------- tablist honors female prog +---------------------------------------------------------------------------------+ | honors female prog _Freq_ _Perc_ _CFreq_ _CPerc_ | |---------------------------------------------------------------------------------| | 0. not enrolled 1. female 2. academic 33 16.50 33 16.50 | | 0. not enrolled 0. male 2. academic 32 16.00 65 32.50 | | 1. enrolled 1. female 2. academic 25 12.50 90 45.00 | | 0. not enrolled 1. female 3. vocation 22 11.00 112 56.00 | | 0. not enrolled 0. male 3. vocation 22 11.00 134 67.00 | |---------------------------------------------------------------------------------| | 0. not enrolled 0. male 1. general 19 9.50 153 76.50 | | 0. not enrolled 1. female 1. general 19 9.50 172 86.00 | | 1. enrolled 0. male 2. academic 15 7.50 187 93.50 | | 1. enrolled 1. female 3. vocation 5 2.50 192 96.00 | | 1. enrolled 1. female 1. general 5 2.50 197 98.50 | |---------------------------------------------------------------------------------| | 1. enrolled 0. male 1. general 2 1.00 199 99.50 | | 1. enrolled 0. male 3. vocation 1 0.50 200 100.00 | +---------------------------------------------------------------------------------+
Both of these commands can be modified to include more categorical variables. For example, let’s add ses and schyp.
table honors female prog, by(ses schtyp) -------------------------------------------------------------------------------------- ses, type of | type of program and female school and | ---- 1. general ---- ---- 2. academic --- ---- 3. vocation --- honors english | 0. male 1. female 0. male 1. female 0. male 1. female ----------------+--------------------------------------------------------------------- 1. low | 1. public | 0. not enrolled | 7 5 3 9 4 6 1. enrolled | 2 1 6 2 ----------------+--------------------------------------------------------------------- 1. low | 2. private | 0. not enrolled | 2 1. enrolled | ----------------+--------------------------------------------------------------------- 2. middle | 1. public | 0. not enrolled | 9 6 9 12 13 13 1. enrolled | 2 5 4 1 2 ----------------+--------------------------------------------------------------------- 2. middle | 2. private | 0. not enrolled | 1 2 8 4 1 1 1. enrolled | 2 ----------------+--------------------------------------------------------------------- 3. high | 1. public | 0. not enrolled | 2 3 9 7 4 2 1. enrolled | 2 1 8 8 1 ----------------+--------------------------------------------------------------------- 3. high | 2. private | 0. not enrolled | 1 3 1 1. enrolled | 1 5 -------------------------------------------------------------------------------------- tablist honors female prog ses schtyp +----------------------------------------------------------------------------------------------------------+ | honors female prog ses schtyp _Freq_ _Perc_ _CFreq_ _CPerc_ | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 0. male 3. vocation 2. middle 1. public 13 6.50 13 6.50 | | 0. not enrolled 1. female 3. vocation 2. middle 1. public 13 6.50 26 13.00 | | 0. not enrolled 1. female 2. academic 2. middle 1. public 12 6.00 38 19.00 | | 0. not enrolled 0. male 2. academic 2. middle 1. public 9 4.50 47 23.50 | | 0. not enrolled 0. male 1. general 2. middle 1. public 9 4.50 56 28.00 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 0. male 2. academic 3. high 1. public 9 4.50 65 32.50 | | 0. not enrolled 1. female 2. academic 1. low 1. public 9 4.50 74 37.00 | | 1. enrolled 1. female 2. academic 3. high 1. public 8 4.00 82 41.00 | | 1. enrolled 0. male 2. academic 3. high 1. public 8 4.00 90 45.00 | | 0. not enrolled 0. male 2. academic 2. middle 2. private 8 4.00 98 49.00 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 1. female 2. academic 3. high 1. public 7 3.50 105 52.50 | | 0. not enrolled 0. male 1. general 1. low 1. public 7 3.50 112 56.00 | | 1. enrolled 1. female 2. academic 1. low 1. public 6 3.00 118 59.00 | | 0. not enrolled 1. female 1. general 2. middle 1. public 6 3.00 124 62.00 | | 0. not enrolled 1. female 3. vocation 1. low 1. public 6 3.00 130 65.00 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 1. female 1. general 1. low 1. public 5 2.50 135 67.50 | | 1. enrolled 1. female 2. academic 3. high 2. private 5 2.50 140 70.00 | | 1. enrolled 0. male 2. academic 2. middle 1. public 5 2.50 145 72.50 | | 0. not enrolled 1. female 2. academic 2. middle 2. private 4 2.00 149 74.50 | | 1. enrolled 1. female 2. academic 2. middle 1. public 4 2.00 153 76.50 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 0. male 3. vocation 1. low 1. public 4 2.00 157 78.50 | | 0. not enrolled 0. male 3. vocation 3. high 1. public 4 2.00 161 80.50 | | 0. not enrolled 1. female 1. general 3. high 1. public 3 1.50 164 82.00 | | 0. not enrolled 0. male 2. academic 3. high 2. private 3 1.50 167 83.50 | | 0. not enrolled 0. male 2. academic 1. low 1. public 3 1.50 170 85.00 | |----------------------------------------------------------------------------------------------------------| | 1. enrolled 1. female 1. general 1. low 1. public 2 1.00 172 86.00 | | 0. not enrolled 1. female 3. vocation 3. high 1. public 2 1.00 174 87.00 | | 0. not enrolled 0. male 1. general 3. high 1. public 2 1.00 176 88.00 | | 0. not enrolled 1. female 1. general 2. middle 2. private 2 1.00 178 89.00 | | 0. not enrolled 1. female 1. general 1. low 2. private 2 1.00 180 90.00 | |----------------------------------------------------------------------------------------------------------| | 1. enrolled 1. female 2. academic 2. middle 2. private 2 1.00 182 91.00 | | 1. enrolled 1. female 3. vocation 2. middle 1. public 2 1.00 184 92.00 | | 1. enrolled 1. female 1. general 2. middle 1. public 2 1.00 186 93.00 | | 1. enrolled 0. male 1. general 3. high 1. public 2 1.00 188 94.00 | | 1. enrolled 1. female 3. vocation 1. low 1. public 2 1.00 190 95.00 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 0. male 1. general 2. middle 2. private 1 0.50 191 95.50 | | 1. enrolled 1. female 3. vocation 3. high 1. public 1 0.50 192 96.00 | | 0. not enrolled 1. female 3. vocation 2. middle 2. private 1 0.50 193 96.50 | | 0. not enrolled 1. female 1. general 3. high 2. private 1 0.50 194 97.00 | | 1. enrolled 0. male 2. academic 1. low 1. public 1 0.50 195 97.50 | |----------------------------------------------------------------------------------------------------------| | 0. not enrolled 0. male 3. vocation 2. middle 2. private 1 0.50 196 98.00 | | 1. enrolled 1. female 1. general 3. high 1. public 1 0.50 197 98.50 | | 0. not enrolled 1. female 2. academic 3. high 2. private 1 0.50 198 99.00 | | 1. enrolled 0. male 2. academic 3. high 2. private 1 0.50 199 99.50 | | 1. enrolled 0. male 3. vocation 2. middle 1. public 1 0.50 200 100.00 | +----------------------------------------------------------------------------------------------------------+
As you can see, this is getting crazy. Notice that there are 72 combinations of the levels of the variables. The empty cells are easy to see in the output from the table command, but they are not shown in the tablist output.
We are not going to talk about issues regarding complete separation (AKA perfect prediction) or quasi-complete separation, but these issues can arise when data become sparse. For information on these topics, please see FAQ What is complete or quasi-complete separation in logistic regression and what are some strategies to deal with the issue?
One continuous predictor
Now let’s use a single continuous predictor, such as read.
logit(p) = ß0 + ß1*read
logit honors read Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -91.119404 Iteration 2: log likelihood = -89.540759 Iteration 3: log likelihood = -89.531488 Iteration 4: log likelihood = -89.531487 Logistic regression Number of obs = 200 LR chi2(1) = 52.23 Prob > chi2 = 0.0000 Log likelihood = -89.531487 Pseudo R2 = 0.2258 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1325727 .0216695 6.12 0.000 .0901013 .1750442 _cons | -8.300192 1.246108 -6.66 0.000 -10.74252 -5.857865 ------------------------------------------------------------------------------
The interpretation of the coefficient is the same as when the predictor was categorical. For a one unit increase the variable read, the expected log of the odds of honors increases by 0.1325727, holding all other variables in the model constant.
We can also show the results in terms of odds ratios.
logit, or Logistic regression Number of obs = 200 LR chi2(1) = 52.23 Prob > chi2 = 0.0000 Log likelihood = -89.531487 Pseudo R2 = 0.2258 ------------------------------------------------------------------------------ honors | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.141762 .0247414 6.12 0.000 1.094285 1.191299 _cons | .0002485 .0003096 -6.66 0.000 .0000216 .0028573 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.
For a one unit change in read, the odds are expected to increase by a factor of 1.141762, holding all other variables in the model constant.
In this case, the estimated coefficient for the intercept is the log odds of a student with a reading score of zero being in honors English. In other words, the odds of being in honors English when the reading score is zero is exp(-8.300192) = .00024847. These odds are very low, but if we look at the distribution of the variable read, we will see that no one in the sample has reading score lower than 28. In fact, all the test scores in the data set were standardized around mean of 50 and standard deviation of 10. So the intercept in this model corresponds to the log odds of being in honors English when read is at the hypothetical value of zero.
How do we interpret the coefficient for read? The coefficient and intercept estimates give us the following equation:
log(p/(1-p)) = logit(p) = -8.300192 + .1325727*read
Let’s fix read at some value. We will use 54. Then the conditional logit of being in honors English when the reading score is held at 54 is
<>log(p/(1-p))(read=54) = -8.300192 + .1325727*54.
We can examine the effect of a one-unit increase in reading score. When the reading score is held at 55, the conditional logit of being in honors English is
log(p/(1-p))(read=55) = -8.300192 + .1325727*55.
Taking the difference of the two equations, we have the following:
log(p/(1-p))(read = 55) – log(p/(1-p))(read = 54) = .1325727.
We can say now that the coefficient for read is the difference in the log odds. In other words, for a one-unit increase in the reading score, the expected change in log odds is .1325727. Can we translate this change in log odds to the change in odds? Indeed, we can. Recall that logarithm converts multiplication and division to addition and subtraction. Its inverse, the exponentiation converts addition and subtraction back to multiplication and division. If we exponentiate both sides of our last equation, we have the following:
exp[log(p/(1-p))(read = 55) – log(p/(1-p))(read = 54)] = exp(log(p/(1-p))(read = 55)) / exp(log(p/(1-p))(read = 54)) = odds(read = 55)/odds(read = 54) = exp(.1325727) = 1.141762.
So we can say for a one-unit increase in reading score, we expect to see about 14% increase in the odds of being in honors English. This 14% of increase does not depend on the value at which read is held. This is why, when we interpret the coefficients, we can say “holding all other variables constant” and we do not specify the value at which they are held. As we will see shortly, when we talk about predicted probabilities, the values at which other variables are held will alter the value of the predicted probabilities.
While the interpretations above are accurate, they may not be terribly helpful or meaningful to members of the audience. The margins command can help with that. We have seen the margins command used with categorical predictors, so now let’s see what can be done with continuous predictors. In times past, the recommendation was that continuous variables should be evaluated at the mean, one standard deviation below the mean and one standard deviation above the mean. There is certainly nothing wrong with doing this, but those values may not be useful in a practical setting. Because the purpose is to provide easily-understandable values that are meaningful in the real world, we suggest that you select values that have real-world meaning. In our example, we will pretend that those values for the variable read are 30, 50 and 70.
margins, at(read=(30 50 70)) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 2._at : read = 50 3._at : read = 70 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | .0130864 .007892 1.66 0.097 -.0023817 .0285545 2 | .1582169 .0317186 4.99 0.000 .0960496 .2203843 3 | .7270883 .0674916 10.77 0.000 .5948071 .8593694 ------------------------------------------------------------------------------
The output above indicates that if a student receives a low score on the reading test (say a score of 30), that student’s predicted probability of being enrolled in honors English is also low (0.013). If a student scores well on the reading test (such as a score of 70), that student’s predicted probability of being in honors English is relatively high, 0.727.
Two categorical predictors
Now let’s run a model with two categorical predictors.
logit honors read i.female i.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -86.845312 Iteration 2: log likelihood = -84.560995 Iteration 3: log likelihood = -84.542357 Iteration 4: log likelihood = -84.542348 Iteration 5: log likelihood = -84.542348 Logistic regression Number of obs = 200 LR chi2(4) = 62.20 Prob > chi2 = 0.0000 Log likelihood = -84.542348 Pseudo R2 = 0.2689 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1352861 .0242218 5.59 0.000 .0878123 .18276 | female | female | 1.08343 .4094357 2.65 0.008 .2809511 1.885909 | prog | academic | .5559416 .5053125 1.10 0.271 -.4344527 1.546336 vocation | .0016408 .6611702 0.00 0.998 -1.294229 1.29751 | _cons | -9.41691 1.481922 -6.35 0.000 -12.32142 -6.512397 ------------------------------------------------------------------------------
We can use the contrast command to determine if the variable prog is statistically significant.
contrast prog Contrasts of marginal linear predictions Margins : asbalanced ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- prog | 2 1.78 0.4101 ------------------------------------------------
The p-value is 0.4101, which is not statistically significant at the 0.05 level. It is up to the researcher to determine if the variable should remain in the model. In general, if the researcher’s hypothesis says that the variable should be included in the model, the variable should remain in the model regardless of the p-value.
We can also use the test command.
test 2.prog 3.prog ( 1) [honors]2.prog = 0 ( 2) [honors]3.prog = 0 chi2( 2) = 1.78 Prob > chi2 = 0.4101
The listcoef command can also be used to display the results. The listcoef command is part of the spost package by Long and Freese. We will include the help option, which is very useful.
listcoef, help logit (N=200): Factor change in odds Odds of: enrolled vs not enrolled ------------------------------------------------------------------------ | b z P>|z| e^b e^bStdX SDofX -------------+---------------------------------------------------------- read | 0.1353 5.585 0.000 1.145 4.003 10.253 | female | female | 1.0834 2.646 0.008 2.955 1.718 0.499 | prog | academic | 0.5559 1.100 0.271 1.744 1.321 0.501 vocation | 0.0016 0.002 0.998 1.002 1.001 0.434 | constant | -9.4169 -6.355 0.000 . . . ------------------------------------------------------------------------ b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD increase in X SDofX = standard deviation of X
The percent option can be added to see the results as a percent change in odds. Some researchers find that discussing their results as a percent change is very useful. The formula that listcoeff is using to convert the values in in the e^b column in the table above to the values in the % column in the table below is simple: percent change in odds = 11{exp(delta-bk) – 1}
listcoef, percent help logit (N=200): Percentage change in odds Odds of: enrolled vs not enrolled ------------------------------------------------------------------------ | b z P>|z| % %StdX SDofX -------------+---------------------------------------------------------- read | 0.1353 5.585 0.000 14.5 300.3 10.253 | female | female | 1.0834 2.646 0.008 195.5 71.8 0.499 | prog | academic | 0.5559 1.100 0.271 74.4 32.1 0.501 vocation | 0.0016 0.002 0.998 0.2 0.1 0.434 | constant | -9.4169 -6.355 0.000 . . . ------------------------------------------------------------------------ b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test % = percent change in odds for unit increase in X %StdX = percent change in odds for SD increase in X SDofX = standard deviation of X
The percent change can be calculated as (OR – 1)*100. So for the variable read, the odds ratio is 1.145. To get the percent change, (1.145 -1)*100 = 14.5.
We can interpret the percent change for the variable read as:
For each additional point on the reading test, the odds of being in honors English increase by 14.5%, holding all other variables constant.
A one standard deviation increase in the log of read increases the odds of being in honors English by 300%, holding all other variables constant.
Changing the reference group
Changing the reference group in Stata is super easy. First, decide which category you want to use as the reference, or base, category, and then include the letter b (for base) and the number. Let’s say that we want to use level 2 of prog as the reference group.
logit honors read i.female ib2.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -86.845312 Iteration 2: log likelihood = -84.560995 Iteration 3: log likelihood = -84.542357 Iteration 4: log likelihood = -84.542348 Iteration 5: log likelihood = -84.542348 Logistic regression Number of obs = 200 LR chi2(4) = 62.20 Prob > chi2 = 0.0000 Log likelihood = -84.542348 Pseudo R2 = 0.2689 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1352861 .0242218 5.59 0.000 .0878123 .18276 | female | 1. female | 1.08343 .4094357 2.65 0.008 .2809511 1.885909 | prog | 1. general | -.5559416 .5053125 -1.10 0.271 -1.546336 .4344527 3. vocation | -.5543008 .559331 -0.99 0.322 -1.650569 .5419677 | _cons | -8.860968 1.510375 -5.87 0.000 -11.82125 -5.900688 ------------------------------------------------------------------------------
Of course, the 2 df test of prog would be the same regardless of which level was used as the reference, as would the predicted probabilities.
contrast prog Contrasts of marginal linear predictions Margins : asbalanced ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- prog | 2 1.78 0.4101 ------------------------------------------------ margins prog Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | 1. general | .2181091 .0562519 3.88 0.000 .1078574 .3283608 2. academic | .2949988 .0364807 8.09 0.000 .223498 .3664996 3. vocation | .2183165 .0636938 3.43 0.001 .093479 .3431539 ------------------------------------------------------------------------------
You can also have Stata determine which level has the most observations and use that as the reference.
logit honors read i.female ib(freq).prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -86.845312 Iteration 2: log likelihood = -84.560995 Iteration 3: log likelihood = -84.542357 Iteration 4: log likelihood = -84.542348 Iteration 5: log likelihood = -84.542348 Logistic regression Number of obs = 200 LR chi2(4) = 62.20 Prob > chi2 = 0.0000 Log likelihood = -84.542348 Pseudo R2 = 0.2689 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1352861 .0242218 5.59 0.000 .0878123 .18276 | female | 1. female | 1.08343 .4094357 2.65 0.008 .2809511 1.885909 | prog | 1. general | -.5559416 .5053125 -1.10 0.271 -1.546336 .4344527 3. vocation | -.5543008 .559331 -0.99 0.322 -1.650569 .5419677 | _cons | -8.860968 1.510375 -5.87 0.000 -11.82125 -5.900688 ------------------------------------------------------------------------------
Let’s see how the margins command can be used to help with interpretation of the results. We will rerun the last model just so that we can see the results.
logit honors read i.female i.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -86.845312 Iteration 2: log likelihood = -84.560995 Iteration 3: log likelihood = -84.542357 Iteration 4: log likelihood = -84.542348 Iteration 5: log likelihood = -84.542348 Logistic regression Number of obs = 200 LR chi2(4) = 62.20 Prob > chi2 = 0.0000 Log likelihood = -84.542348 Pseudo R2 = 0.2689 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .1352861 .0242218 5.59 0.000 .0878123 .18276 | female | female | 1.08343 .4094357 2.65 0.008 .2809511 1.885909 | prog | academic | .5559416 .5053125 1.10 0.271 -.4344527 1.546336 vocation | .0016408 .6611702 0.00 0.998 -1.294229 1.29751 | _cons | -9.41691 1.481922 -6.35 0.000 -12.32142 -6.512397 ------------------------------------------------------------------------------
We are going to spend some time looking at various ways to specify the margins command to get the output that you want. Using the margins command after a logistic regression is completely optional, although it is often very helpful.
First we will get the predicted probabilities for the variable female.
margins female Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | male | .1888093 .0345352 5.47 0.000 .1211215 .2564971 female | .3344859 .0382269 8.75 0.000 .2595625 .4094092 ------------------------------------------------------------------------------
The results show that the predicted probability is higher for females than males, which makes sense because the coefficient for the variable female is positive.
Now we will get the predicted probabilities for the variable prog.
margins prog Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | general | .2181091 .0562519 3.88 0.000 .1078574 .3283608 academic | .2949988 .0364807 8.09 0.000 .223498 .3664996 vocation | .2183165 .0636938 3.43 0.001 .093479 .3431539 ------------------------------------------------------------------------------
The predicted probability of being in the honors English class is highest for those who are in the “academic” program, and they are about equal for those in the “general” and the “vocation” programs.
The predicted probabilities for both female and prog can be obtained with a single margins command.
margins female prog Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | male | .1888093 .0345352 5.47 0.000 .1211215 .2564971 female | .3344859 .0382269 8.75 0.000 .2595625 .4094092 | prog | general | .2181091 .0562519 3.88 0.000 .1078574 .3283608 academic | .2949988 .0364807 8.09 0.000 .223498 .3664996 vocation | .2183165 .0636938 3.43 0.001 .093479 .3431539 ------------------------------------------------------------------------------
Now we will get the predicted probabilities for female at specific levels of read only for program type 2, which is the”academic” program.
margins female, at(read=(30 50 70) prog=2) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 prog = 2 2._at : read = 50 prog = 2 3._at : read = 70 prog = 2 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .0081433 .0066016 1.23 0.217 -.0047956 .0210822 1#female | .0236848 .0160775 1.47 0.141 -.0078265 .055196 2#male | .1094218 .0412597 2.65 0.008 .0285542 .1902894 2#female | .2663481 .0632006 4.21 0.000 .1424772 .390219 3#male | .6477268 .0912298 7.10 0.000 .4689197 .8265339 3#female | .8445516 .0579974 14.56 0.000 .7308788 .9582245 ------------------------------------------------------------------------------
In the command above, we specified the three levels at which the variable read should be held. We can also specify the interval by which Stata should increment when calculating the predicted probabilities. In the example below, we specify an interval of 20. Stata will start at the first number given, increment by the second number given, and end with the third number given. Hence, the predicted probabilities will be calculated for read = 30, read = 50 and read = 70.
margins female, at(read=(30(20)70) prog=2) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 prog = 2 2._at : read = 50 prog = 2 3._at : read = 70 prog = 2 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .0081433 .0066016 1.23 0.217 -.0047956 .0210822 1#female | .0236848 .0160775 1.47 0.141 -.0078265 .055196 2#male | .1094218 .0412597 2.65 0.008 .0285542 .1902894 2#female | .2663481 .0632006 4.21 0.000 .1424772 .390219 3#male | .6477268 .0912298 7.10 0.000 .4689197 .8265339 3#female | .8445516 .0579974 14.56 0.000 .7308788 .9582245 ------------------------------------------------------------------------------
In the margins command below, we request the predicted probabilities for female at three levels of read, for specific values of prog.
margins female, at(read=(30 50 70) prog=(1 3)) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 prog = 1 2._at : read = 30 prog = 3 3._at : read = 50 prog = 1 4._at : read = 50 prog = 3 5._at : read = 70 prog = 1 6._at : read = 70 prog = 3 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .0046867 .0038796 1.21 0.227 -.0029172 .0122906 1#female | .0137226 .009905 1.39 0.166 -.0056908 .033136 2#male | .0046944 .0038948 1.21 0.228 -.0029393 .012328 2#female | .0137448 .0096791 1.42 0.156 -.0052258 .0327154 3#male | .0658288 .0322765 2.04 0.041 .002568 .1290896 3#female | .1723344 .0683062 2.52 0.012 .0384568 .3062121 4#male | .0659298 .0353531 1.86 0.062 -.003361 .1352205 4#female | .1725686 .0720298 2.40 0.017 .0313927 .3137445 5#male | .513277 .14321 3.58 0.000 .2325905 .7939635 5#female | .7570458 .1152195 6.57 0.000 .5312198 .9828718 6#male | .5136869 .164499 3.12 0.002 .1912747 .836099 6#female | .7573474 .1257846 6.02 0.000 .5108141 1.003881 ------------------------------------------------------------------------------
In the margins command below, we request the predicted probabilities for prog at specific levels of read only for females.
margins prog, at(read=(20(10)70) female==1) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 20 female = 1 2._at : read = 30 female = 1 3._at : read = 40 female = 1 4._at : read = 50 female = 1 5._at : read = 60 female = 1 6._at : read = 70 female = 1 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#prog | 1#general | .0035837 .0033206 1.08 0.280 -.0029246 .0100921 1#academic | .0062319 .0057058 1.09 0.275 -.0049512 .0174151 1#vocation | .0035896 .003226 1.11 0.266 -.0027332 .0099124 2#general | .0137226 .009905 1.39 0.166 -.0056908 .033136 2#academic | .0236848 .0160775 1.47 0.141 -.0078265 .055196 2#vocation | .0137448 .0096791 1.42 0.156 -.0052258 .0327154 3#general | .051075 .0275788 1.85 0.064 -.0029784 .1051285 3#academic | .0857952 .0380499 2.25 0.024 .0112186 .1603717 3#vocation | .0511546 .0276057 1.85 0.064 -.0029516 .1052608 4#general | .1723344 .0683062 2.52 0.012 .0384568 .3062121 4#academic | .2663481 .0632006 4.21 0.000 .1424772 .390219 4#vocation | .1725686 .0720298 2.40 0.017 .0313927 .3137445 5#general | .4461322 .124094 3.60 0.000 .2029123 .689352 5#academic | .5841008 .0733682 7.96 0.000 .4403019 .7278998 5#vocation | .4465376 .1359971 3.28 0.001 .1799882 .7130871 6#general | .7570458 .1152195 6.57 0.000 .5312198 .9828718 6#academic | .8445516 .0579974 14.56 0.000 .7308788 .9582245 6#vocation | .7573474 .1257846 6.02 0.000 .5108141 1.003881 ------------------------------------------------------------------------------
The marginsplot command will graph the last margins output. The line for “general” is difficult to see because it is underneath the line for “vocation”.
marginsplot, legend(rows(1))
Before moving on to interactions, let’s revisit an important point, and that is that the values of the covariates really matter when calculating predicted probabilities. In the example below, we will first get the predicted probabilities for female for program type 1 (“general”) when the variable read is held at 30, 50 and 70. In the next example, the values of read will be held at 31, 52 and 73. Notice the difference in the predicted probabilities in the two output tables. While there is no “correct” values at which to hold any predictor variable, where the variables are held will dictate what the predicted probabilities are calculated to be.
margins female, at(read=(30 50 70) prog=1) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 prog = 1 2._at : read = 50 prog = 1 3._at : read = 70 prog = 1 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .0046867 .0038796 1.21 0.227 -.0029172 .0122906 1#female | .0137226 .009905 1.39 0.166 -.0056908 .033136 2#male | .0658288 .0322765 2.04 0.041 .002568 .1290896 2#female | .1723344 .0683062 2.52 0.012 .0384568 .3062121 3#male | .513277 .14321 3.58 0.000 .2325905 .7939635 3#female | .7570458 .1152195 6.57 0.000 .5312198 .9828718 ------------------------------------------------------------------------------ margins female, at(read=(31 52 73) prog=1) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 31 prog = 1 2._at : read = 52 prog = 1 3._at : read = 73 prog = 1 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .005362 .0043319 1.24 0.216 -.0031284 .0138525 1#female | .0156793 .0110124 1.42 0.155 -.0059045 .0372632 2#male | .0845533 .0394435 2.14 0.032 .0072454 .1618611 2#female | .2144007 .0798098 2.69 0.007 .0579765 .370825 3#male | .6127719 .1457039 4.21 0.000 .3271974 .8983464 3#female | .8238147 .09816 8.39 0.000 .6314247 1.016205 ------------------------------------------------------------------------------
Models with interactions
Categorical by categorical interactions
We will start with a categorical-by-categorical interaction with the variables female and prog. First, let’s do a three-way crosstab.
table honors female prog -------------------------------------------------------------------------------------- | type of program and female | ---- 1. general ---- ---- 2. academic --- ---- 3. vocation --- honors english | 0. male 1. female 0. male 1. female 0. male 1. female ----------------+--------------------------------------------------------------------- 0. not enrolled | 19 19 32 33 22 22 1. enrolled | 2 5 15 25 1 5 --------------------------------------------------------------------------------------
Notice that some of the cells have very few observations.
logit honors i.female##i.prog Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -105.80227 Iteration 2: log likelihood = -105.03687 Iteration 3: log likelihood = -105.01869 Iteration 4: log likelihood = -105.01869 Logistic regression Number of obs = 200 LR chi2(5) = 21.25 Prob > chi2 = 0.0007 Log likelihood = -105.01869 Pseudo R2 = 0.0919 ---------------------------------------------------------------------------------- honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- female | female | .9162888 .8973641 1.02 0.307 -.8425125 2.67509 | prog | academic | 1.493604 .8065652 1.85 0.064 -.0872346 3.074443 vocation | -.8397523 1.264154 -0.66 0.507 -3.317448 1.637944 | female#prog | female#academic | -.4362348 .9866519 -0.44 0.658 -2.370037 1.497567 female#vocation | .6931488 1.447816 0.48 0.632 -2.144518 3.530816 | _cons | -2.25129 .7433913 -3.03 0.002 -3.70831 -.7942696 ----------------------------------------------------------------------------------
While the overall model is statistically significant (p = 0.0007), none of the predictors are.
We will use the contrast command to get the multi-degree-of-freedom test of the interaction term, which will have 2 degrees of freedom (1*2 = 2).
contrast female#prog Contrasts of marginal linear predictions Margins : asbalanced ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- female#prog | 2 0.97 0.6150 ------------------------------------------------
The p-value for the omnibus test is 0.6150, which is well above 0.05, so the interaction term is not statistically significant. Despite these results, we will continue to look at the interaction as if it was of interest.
We can use the margins command to give us the predicted probabilities for each combination of values of female and prog.
margins female#prog Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ---------------------------------------------------------------------------------- | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- female#prog | male#general | .0952383 .0640565 1.49 0.137 -.0303102 .2207867 male#academic | .3191489 .0679945 4.69 0.000 .1858822 .4524157 male#vocation | .0434783 .0425226 1.02 0.307 -.0398645 .126821 female#general | .2083333 .0828982 2.51 0.012 .0458559 .3708108 female#academic | .4310345 .0650257 6.63 0.000 .3035865 .5584825 female#vocation | .1851852 .0747568 2.48 0.013 .0386646 .3317058 ----------------------------------------------------------------------------------
The predicted probabilities are rather similar for each combination of levels of the variables, which corresponds to the fact that the interaction term is not statistically significant. We can graph the interaction with the marginsplot command.
marginsplot Variables that uniquely identify margins: female prog
Despite the fact that the interaction is not statistically significant, we will show how some of the post-estimation commands can be used to explore the interaction. We can get all pairwise comparisons with the pwcompare command. With no options, we get the contrast coefficient, its standard error and its unadjusted 95% confidence interval.
pwcompare female#prog Pairwise comparisons of marginal linear predictions Margins : asbalanced ----------------------------------------------------------------------------------------------------- | Unadjusted | Contrast Std. Err. [95% Conf. Interval] ----------------------------------------------------+------------------------------------------------ honors | female#prog | (0. male#2. academic) vs (0. male#1. general) | .5559416 .5053125 -.4344527 1.546336 (0. male#3. vocation) vs (0. male#1. general) | .0016408 .6611702 -1.294229 1.29751 (1. female#1. general) vs (0. male#1. general) | 1.08343 .4094357 .2809511 1.885909 (1. female#2. academic) vs (0. male#1. general) | 1.639372 .6280644 .4083882 2.870356 (1. female#3. vocation) vs (0. male#1. general) | 1.085071 .7588425 -.4022329 2.572375 (0. male#3. vocation) vs (0. male#2. academic) | -.5543008 .559331 -1.650569 .5419677 (1. female#1. general) vs (0. male#2. academic) | .5274886 .6719313 -.7894726 1.84445 (1. female#2. academic) vs (0. male#2. academic) | 1.08343 .4094357 .2809511 1.885909 (1. female#3. vocation) vs (0. male#2. academic) | .5291294 .692864 -.828859 1.887118 (1. female#1. general) vs (0. male#3. vocation) | 1.081789 .7960686 -.4784762 2.642055 (1. female#2. academic) vs (0. male#3. vocation) | 1.637731 .6934817 .2785318 2.99693 (1. female#3. vocation) vs (0. male#3. vocation) | 1.08343 .4094357 .2809511 1.885909 (1. female#2. academic) vs (1. female#1. general) | .5559416 .5053125 -.4344527 1.546336 (1. female#3. vocation) vs (1. female#1. general) | .0016408 .6611702 -1.294229 1.29751 (1. female#3. vocation) vs (1. female#2. academic) | -.5543008 .559331 -1.650569 .5419677 -----------------------------------------------------------------------------------------------------
We can add the pveffects option to get the z test statistic and the unadjusted p-value.
pwcompare female#prog, pveffects Pairwise comparisons of marginal linear predictions Margins : asbalanced -------------------------------------------------------------------------------------------- | Unadjusted | Contrast Std. Err. z P>|z| ----------------------------------------------------+--------------------------------------- honors | female#prog | (0. male#2. academic) vs (0. male#1. general) | .5559416 .5053125 1.10 0.271 (0. male#3. vocation) vs (0. male#1. general) | .0016408 .6611702 0.00 0.998 (1. female#1. general) vs (0. male#1. general) | 1.08343 .4094357 2.65 0.008 (1. female#2. academic) vs (0. male#1. general) | 1.639372 .6280644 2.61 0.009 (1. female#3. vocation) vs (0. male#1. general) | 1.085071 .7588425 1.43 0.153 (0. male#3. vocation) vs (0. male#2. academic) | -.5543008 .559331 -0.99 0.322 (1. female#1. general) vs (0. male#2. academic) | .5274886 .6719313 0.79 0.432 (1. female#2. academic) vs (0. male#2. academic) | 1.08343 .4094357 2.65 0.008 (1. female#3. vocation) vs (0. male#2. academic) | .5291294 .692864 0.76 0.445 (1. female#1. general) vs (0. male#3. vocation) | 1.081789 .7960686 1.36 0.174 (1. female#2. academic) vs (0. male#3. vocation) | 1.637731 .6934817 2.36 0.018 (1. female#3. vocation) vs (0. male#3. vocation) | 1.08343 .4094357 2.65 0.008 (1. female#2. academic) vs (1. female#1. general) | .5559416 .5053125 1.10 0.271 (1. female#3. vocation) vs (1. female#1. general) | .0016408 .6611702 0.00 0.998 (1. female#3. vocation) vs (1. female#2. academic) | -.5543008 .559331 -0.99 0.322 --------------------------------------------------------------------------------------------
We can use the mcompare option to correct for multiple tests. In the example below, we request a Bonferroni correction. Other possible corrections are sidak, scheffe and snk (Student-Newman-Keuls’).
pwcompare female#prog, pveffects mcompare(bon) Pairwise comparisons of marginal linear predictions Margins : asbalanced --------------------------- | Number of | Comparisons -------------+------------- honors | female#prog | 15 --------------------------- -------------------------------------------------------------------------------------------- | Bonferroni | Contrast Std. Err. z P>|z| ----------------------------------------------------+--------------------------------------- honors | female#prog | (0. male#2. academic) vs (0. male#1. general) | 1.493604 .8065652 1.85 0.961 (0. male#3. vocation) vs (0. male#1. general) | -.8397523 1.264154 -0.66 1.000 (1. female#1. general) vs (0. male#1. general) | .9162888 .8973641 1.02 1.000 (1. female#2. academic) vs (0. male#1. general) | 1.973658 .7892615 2.50 0.186 (1. female#3. vocation) vs (0. male#1. general) | .7696853 .8933562 0.86 1.000 (0. male#3. vocation) vs (0. male#2. academic) | -2.333356 1.069285 -2.18 0.436 (1. female#1. general) vs (0. male#2. academic) | -.5773154 .5920711 -0.98 1.000 (1. female#2. academic) vs (0. male#2. academic) | .480054 .4101459 1.17 1.000 (1. female#3. vocation) vs (0. male#2. academic) | -.7239188 .5859788 -1.24 1.000 (1. female#1. general) vs (0. male#3. vocation) | 1.756041 1.139336 1.54 1.000 (1. female#2. academic) vs (0. male#3. vocation) | 2.81341 1.056294 2.66 0.116 (1. female#3. vocation) vs (0. male#3. vocation) | 1.609438 1.136182 1.42 1.000 (1. female#2. academic) vs (1. female#1. general) | 1.057369 .5682734 1.86 0.942 (1. female#3. vocation) vs (1. female#1. general) | -.1466035 .7057522 -0.21 1.000 (1. female#3. vocation) vs (1. female#2. academic) | -1.203973 .5619231 -2.14 0.482 --------------------------------------------------------------------------------------------
In the example below, we will use the margins command to see if female is statistically significant at each level of prog.
* contrast of adjusted predictions margins female@prog Contrasts of adjusted predictions Model VCE : OIM Expression : Pr(honors), predict() ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- female@prog | general | 1 1.17 0.2804 academic | 1 1.41 0.2344 vocation | 1 2.71 0.0994 Joint | 3 5.29 0.1515 ------------------------------------------------
If you want to make specific comparisons, you need to access the values stored either by the model or by margins. We will start by asking if prog level 2 is different from prog level 1 for females only.
We will start by using the output from margins with the lincom command. Before we do this, let’s quietly rerun our logistic regression model. In “Stata speak”, to run something quietly means that the model will run but no output will be shown. This is useful when you need to be sure that the correct model is in memory, but you don’t need to see the output. Next, we will run the margins command with the coeflegend and the post options. The coeflegend option is super useful and works with many estimation commands. It shows you want Stata calls each of the estimates, so that you can use those estimates in post-estimation commands. The post option posts the results to Stata’s memory so that they can be used in further calculations. You must use the post option when you use the coeflegend option with margins.
quietly logit honors i.female##i.prog
margins female#prog, coeflegend post Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() ---------------------------------------------------------------------------------- | Margin Legend -----------------+---------------------------------------------------------------- female#prog | male#general | .0952383 _b[0bn.female#1bn.prog] male#academic | .3191489 _b[0bn.female#2.prog] male#vocation | .0434783 _b[0bn.female#3.prog] female#general | .2083333 _b[1.female#1bn.prog] female#academic | .4310345 _b[1.female#2.prog] female#vocation | .1851852 _b[1.female#3.prog] ----------------------------------------------------------------------------------
lincom _b[1.female#2.prog] - _b[1.female#1bn.prog] ( 1) - 1.female#1bn.prog + 1.female#2.prog = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .2227011 .1053587 2.11 0.035 .016202 .4292003 ------------------------------------------------------------------------------
Just to be sure that Stata did what we wanted, we can use the display command to calculate the value ourselves.
di .4310345 - .2083333 .2227012
The mlincom command is a convenience command that works after the margins command and is part of the spost ado package. Instead of specifying the labels Stata assigned to each estimate, you can use the number of the estimate. First, let’s look at the matrix of stored estimates with the matlist command.
matlist e(b) | 0.female# 0.female# 0.female# 1.female# 1.female# 1.female# | 1.prog 2.prog 3.prog 1.prog 2.prog 3.prog -------------+------------------------------------------------------------------ y1 | .0952383 .3191489 .0434783 .2083333 .4310345 .1851852 mlincom 5 - 4, all | lincom se zvalue pvalue ll ul -------------+------------------------------------------------------------ 1 | 0.223 0.105 2.114 0.035 0.016 0.429
The listcoef command can also be used. We will quietly rerun the model.
quietly logit honors i.female##i.prog listcoef, help logit (N=200): Factor change in odds Odds of: 1. enrolled vs 0. not enrolled ------------------------------------------------------------------------ | b z P>|z| e^b e^bStdX SDofX -------------+---------------------------------------------------------- female | 1. female | 0.9163 1.021 0.307 2.500 1.580 0.499 | prog | 2. academic | 1.4936 1.852 0.064 4.453 2.112 0.501 3. vocation | -0.8398 -0.664 0.507 0.432 0.695 0.434 | female#c.2~g | 1. female | -0.4362 -0.442 0.658 0.646 0.820 0.455 | female#c.3~g | 1. female | 0.6931 0.479 0.632 2.000 1.268 0.343 | constant | -2.2513 -3.028 0.002 . . . ------------------------------------------------------------------------ b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD increase in X SDofX = standard deviation of X
Categorical by continuous interactions
For this example, we will interact the binary variable female with the continuous variable socst. Let’s use the summarize command to get some descriptive statistics on our variables.
sum i.female##c.socst Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- female | 1. female | 200 .545 .4992205 0 1 socst | 200 52.405 10.73579 26 71 | female#| c.socst | 1. female | 200 28.84 27.47231 0 71
This output is useful for many reasons. One reason is that you need to know the minimum and maximum of variables when you run the margins command.
logit honors i.female##c.socst Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -97.381142 Iteration 2: log likelihood = -94.500287 Iteration 3: log likelihood = -94.374577 Iteration 4: log likelihood = -94.374049 Iteration 5: log likelihood = -94.374049 Logistic regression Number of obs = 200 LR chi2(3) = 42.54 Prob > chi2 = 0.0000 Log likelihood = -94.374049 Pseudo R2 = 0.1839 -------------------------------------------------------------------------------- honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- female | female | 6.406617 3.162921 2.03 0.043 .2074054 12.60583 socst | .1774272 .0468628 3.79 0.000 .0855779 .2692765 | female#c.socst | female | -.0974514 .052999 -1.84 0.066 -.2013275 .0064248 | _cons | -11.5045 2.838154 -4.05 0.000 -17.06718 -5.941824 --------------------------------------------------------------------------------
The overall model is statistically significant (p = 0.0000), and the interaction is not significant. However, we are going to pretend that it is and explore ways to understand the interaction. Because the interaction term has only 1 degree of freedom, running the contrast command on the interaction is unnecessary. However, it is shown below so that you can see how to specify a continuous variable in the command. Notice that there is only one # and the c. before the variable socst.
contrast female#c.socst Contrasts of marginal linear predictions Margins : asbalanced -------------------------------------------------- | df chi2 P>chi2 ---------------+---------------------------------- female#c.socst | 1 3.38 0.0660 --------------------------------------------------
The margins command can be used to get predicted probabilities for female at the desired values of socst.
margins female, at(socst=(30(20)70)) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 2._at : socst = 50 3._at : socst = 70 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#male | .0020627 .0029825 0.69 0.489 -.0037829 .0079083 1#female | .0630553 .039731 1.59 0.113 -.014816 .1409267 2#male | .0670355 .0356569 1.88 0.060 -.0028507 .1369217 2#female | .2499094 .0486082 5.14 0.000 .1546392 .3451797 3#male | .7141036 .1132276 6.31 0.000 .4921816 .9360257 3#female | .6225583 .0975773 6.38 0.000 .4313102 .8138064 ------------------------------------------------------------------------------
The values in this table can be graphed with the marginsplot command.
marginsplot Variables that uniquely identify margins: socst female
As before, we can make comparisons between the values calculated by margins. Notice, however, that the variable read is still a continuous variable in the model, even though we can test difference at different values.
margins female, at(socst=(30(20)70)) coeflegend post vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 2._at : socst = 50 3._at : socst = 70 ------------------------------------------------------------------------------ | Margin Legend -------------+---------------------------------------------------------------- _at#female | 1#male | .0020627 _b[1bn._at#0bn.female] 1#female | .0630553 _b[1bn._at#1.female] 2#male | .0670355 _b[2._at#0bn.female] 2#female | .2499094 _b[2._at#1.female] 3#male | .7141036 _b[3._at#0bn.female] 3#female | .6225583 _b[3._at#1.female] ------------------------------------------------------------------------------
Let’s test the difference between females and males when the social study score is 50.
lincom _b[2._at#1.female] - _b[2._at#0bn.female] ( 1) - 2._at#0bn.female + 2._at#1.female = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .182874 .0602841 3.03 0.002 .0647194 .3010285 ------------------------------------------------------------------------------
This difference is statistically significant. Now let’s do the same test when the social studies score is 30.
lincom _b[1._at#1.female] - _b[1._at#0bn.female] ( 1) - 1bn._at#0bn.female + 1bn._at#1.female = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .0609926 .0398428 1.53 0.126 -.0170978 .1390831 ------------------------------------------------------------------------------
At this value of socst, the difference between females and males is not statistically significantly different.
Before moving on to continuous by continuous interactions, let’s stop and add one more continuous predictor to our model. We will add the variable read and show how the predicted probabilities change when read is held at different values.
logit honors i.female##c.socst read Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -86.134978 Iteration 2: log likelihood = -82.031517 Iteration 3: log likelihood = -81.860765 Iteration 4: log likelihood = -81.859756 Iteration 5: log likelihood = -81.859756 Logistic regression Number of obs = 200 LR chi2(4) = 67.57 Prob > chi2 = 0.0000 Log likelihood = -81.859756 Pseudo R2 = 0.2921 -------------------------------------------------------------------------------- honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- female | 1. female | 7.12505 3.113835 2.29 0.022 1.022045 13.22805 socst | .1084125 .0469643 2.31 0.021 .0163642 .2004608 | female#c.socst | 1. female | -.1049667 .0525745 -2.00 0.046 -.2080107 -.0019226 | read | .12548 .0277186 4.53 0.000 .0711525 .1798076 _cons | -14.76649 2.955381 -5.00 0.000 -20.55893 -8.974047 --------------------------------------------------------------------------------
This output looks good. Now let’s use the margins command and include only the at option to specify levels of socst. The default is for Stata to treat other variables in the model as their values are observed. The asobserved option can be added to produce the same results.
margins female, at(socst=(30(20)70)) Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 2._at : socst = 50 3._at : socst = 70 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#0. male | .0150025 .0208116 0.72 0.471 -.0257876 .0557925 1#1. female | .32096 .1268867 2.53 0.011 .0722666 .5696534 2#0. male | .1010581 .0439717 2.30 0.022 .0148752 .187241 2#1. female | .3322728 .0470899 7.06 0.000 .2399783 .4245674 3#0. male | .3819052 .1132221 3.37 0.001 .1599939 .6038165 3#1. female | .3437552 .0826705 4.16 0.000 .181724 .5057863 ------------------------------------------------------------------------------ margins female, at(socst=(30(20)70)) asobserved Predictive margins Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 2._at : socst = 50 3._at : socst = 70 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#0. male | .0150025 .0208116 0.72 0.471 -.0257876 .0557925 1#1. female | .32096 .1268867 2.53 0.011 .0722666 .5696534 2#0. male | .1010581 .0439717 2.30 0.022 .0148752 .187241 2#1. female | .3322728 .0470899 7.06 0.000 .2399783 .4245674 3#0. male | .3819052 .1132221 3.37 0.001 .1599939 .6038165 3#1. female | .3437552 .0826705 4.16 0.000 .181724 .5057863 ------------------------------------------------------------------------------
Now let’s set the value of read to its mean.
margins female, at(socst=(30(20)70) read = (52.23)) Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 read = 52.23 2._at : socst = 50 read = 52.23 3._at : socst = 70 read = 52.23 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#0. male | .0069621 .0096717 0.72 0.472 -.0119941 .0259182 1#1. female | .2720513 .1463304 1.86 0.063 -.014751 .5588535 2#0. male | .0577558 .0308495 1.87 0.061 -.002708 .1182197 2#1. female | .2859118 .0551228 5.19 0.000 .1778731 .3939505 3#0. male | .3489201 .152427 2.29 0.022 .0501687 .6476715 3#1. female | .3001873 .1109357 2.71 0.007 .0827573 .5176174 ------------------------------------------------------------------------------
Now let’s set the value of read to 60.
margins female, at(socst=(30(20)70) read = (60)) Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : socst = 30 read = 60 2._at : socst = 50 read = 60 3._at : socst = 70 read = 60 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1#0. male | .0182475 .0255591 0.71 0.475 -.0318473 .0683423 1#1. female | .497685 .2124644 2.34 0.019 .0812624 .9141075 2#0. male | .139787 .0673932 2.07 0.038 .0076987 .2718754 2#1. female | .5149098 .0878009 5.86 0.000 .3428232 .6869964 3#0. male | .5869062 .1449033 4.05 0.000 .302901 .8709114 3#1. female | .5320993 .1124178 4.73 0.000 .3117644 .7524342 ------------------------------------------------------------------------------
The output in the last two tables is different, even though the variable read was not included in the interaction. This is why we say that the value of the covariates matter when calculating the predicted probabilities. Of course, in the metric of log odds, the value at which read is held does not matter when calculating the coefficients of the other variables.
Continuous by continuous interactions
For this example, we will interact the variables read and science. First, let’s look at some descriptive statistics.
sum c.read##c.science Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- read | 200 52.23 10.25294 28 76 science | 200 51.85 9.900891 26 74 | c.read#| c.science | 200 2771.775 952.0686 986 5092
Now we will run our model.
logit honors c.read##c.science Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -89.3918 Iteration 2: log likelihood = -85.086047 Iteration 3: log likelihood = -84.668677 Iteration 4: log likelihood = -84.666101 Iteration 5: log likelihood = -84.666101 Logistic regression Number of obs = 200 LR chi2(3) = 61.96 Prob > chi2 = 0.0000 Log likelihood = -84.666101 Pseudo R2 = 0.2679 ---------------------------------------------------------------------------------- honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- read | .3965415 .1728687 2.29 0.022 .0577252 .7353579 science | .3480727 .1646122 2.11 0.034 .0254387 .6707068 | c.read#c.science | -.0051357 .0029119 -1.76 0.078 -.0108429 .0005715 | _cons | -26.28497 9.642589 -2.73 0.006 -45.1841 -7.385845 ----------------------------------------------------------------------------------
By now you should be feeling pretty comfortable with the basics of the output above. So let’s start with a seemingly easy question: Is the interaction term statistically significant? Many people would say no because the observed p-value of 0.078 is greater than our alpha level of 0.05. However, this is one of the places where logistic regression and OLS regression are not similar at all.
Here is a quote from Norton, Wang and Ai (2004): “Unfortunately, the intuition from linear regression models does not ex-tend to nonlinear models. The marginal effect of a change in both interacted variables is not equal to the marginal effect of changing just the interaction term. More surprisingly, the sign may be different for different observations. The statistical significance cannot be determined from the z-statistic reported in the regression output. The odds-ratio interpretation of logit coefficients cannot be used for interaction terms.” (page 154)
“There are four important implications of this equation for nonlinear models. First,the interaction effect could be nonzero, even if ß12 = 0. Second, the statistical significance of the interaction effect cannot be tested with a simple t test on the coefficient of the interaction term ß12. Instead, the statistical significance of the entire cross derivative must be calculated. Third, the interaction effect is conditional on the independent variables, unlike the interaction effect in linear models. (It is well known that the marginal effect of a single, uninteracted variable in a nonlinear model is conditional on the independent variables.) Fourth, because there are two additive terms, each of which can be positive or negative, the interaction effect may have different signs for different values of covariates. Therefore, the sign of ß12 does not necessarily indicate the sign of the interaction effect.” (page 156)
What this means for reporting your results is that you should not state whether your interaction is statistically significant. Rather, you will need to discuss one or more ranges in which the interaction is statistically significant, regardless of the p-value given in the output table.
We will use Norton, et. al.’s inteff command to examine the interaction. The inteff command requires that you create the interaction term manually and run the logit command with that interaction term before inteff.
gen readsci = read*science quietly logit honors read science readsci inteff honors read science readsci Logit with two continuous variables interacted Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- _logit_ie | 200 -.0002656 .0007018 -.0013207 .0005851 _logit_se | 200 .0004389 .0002959 .0000202 .0011699 _logit_z | 200 .6022096 2.153007 -2.710142 4.456493
The graph shows two regions where the interaction is statistically significant. Another community-contributed command called inteff3 can be used when a binary by binary by binary interaction is used (difference-in-difference-in-difference).
There are a couple of articles that provide helpful examples of correctly interpreting interactions in non-linear models. One is by Maarten Buis (referenced below), and another is a post by Vince Wiggins of Stata Corp.
Despite the difficulties of knowing if or where the interaction term is statistically significant, and not being able to interpret the odds ratio of the interaction term, we can still use the margins command to get some descriptive information about the interaction. This can be done because we are not talking about statistical significance; rather, we are only looking at descriptive values based on the current model. We will quietly rerun the model in a way that margins will understand.
quietly logit honors c.read##c.science margins, at(read=(30(20)70) science=(30 50 70)) vsquish Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(honors), predict() 1._at : read = 30 science = 30 2._at : read = 30 science = 50 3._at : read = 30 science = 70 4._at : read = 50 science = 30 5._at : read = 50 science = 50 6._at : read = 50 science = 70 7._at : read = 70 science = 30 8._at : read = 70 science = 50 9._at : read = 70 science = 70 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | .0001899 .0004304 0.44 0.659 -.0006536 .0010334 2 | .0091155 .0081189 1.12 0.262 -.0067972 .0250283 3 | .3081947 .267849 1.15 0.250 -.2167798 .8331691 4 | .0236781 .0196095 1.21 0.227 -.0147557 .062112 5 | .1308492 .0349136 3.75 0.000 .0624198 .1992785 6 | .483081 .1363859 3.54 0.000 .2157696 .7503924 7 | .7558718 .2854873 2.65 0.008 .196327 1.315417 8 | .7112934 .1261598 5.64 0.000 .4640247 .9585621 9 | .6622125 .1319474 5.02 0.000 .4036003 .9208247 ------------------------------------------------------------------------------
marginsplot, legend(rows(1)) Variables that uniquely identify margins: read science
Let’s look at one last example. This time we will use the square of reading score as the interaction term.
*logit honors c.read##c.read gen read2 = read*read logit honors read read read2 note: read omitted because of collinearity Iteration 0: log likelihood = -115.64441 Iteration 1: log likelihood = -91.768787 Iteration 2: log likelihood = -88.71512 Iteration 3: log likelihood = -88.475651 Iteration 4: log likelihood = -88.474528 Iteration 5: log likelihood = -88.474527 Logistic regression Number of obs = 200 LR chi2(2) = 54.34 Prob > chi2 = 0.0000 Log likelihood = -88.474527 Pseudo R2 = 0.2349 ------------------------------------------------------------------------------ honors | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .5116124 .2741212 1.87 0.062 -.0256554 1.04888 read | 0 (omitted) read2 | -.003321 .0023626 -1.41 0.160 -.0079516 .0013096 _cons | -18.88234 7.849826 -2.41 0.016 -34.26771 -3.496961 ------------------------------------------------------------------------------ inteff honors read read read2 Logit with two continuous variables interacted Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- _logit_ie | 200 .0039721 .0055833 -.0064616 .0106083 _logit_se | 200 .0018632 .0008679 .0002302 .0041644 _logit_z | 200 3.559618 4.169944 -2.229903 10.73907
As before, we see that the p-value in the logistic regression output indicates that the interaction term is not statistically significant, yet it seems that for some regions, the interaction is statistically significant. This is why such interaction terms are so difficult in logistic regression.
References
Buis, M. L. (2010). Stata Tip 87: Interpretation of interactions in nonlinear models. The Stata Journal, 10(2), pages 305-308.
Hosmer, D. W., Lemeshow, S. and Sturdivant, R. X. (2013). Applied Logistic Regression, Third Edition. Hoboken, New Jersey: Wiley.
Long, J. S. and Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, Third Edition. College Station, TX: Stata Press.
Norton, E. C., Wang, H., and Ai, C. (2004). Computing interaction effects and standard errors in logit and probit models. The Stata Journal, 4(2), pages 154-167.
Williams, R. (2012). Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 12(2), pages 308-331.