Introduction
In this page, we will discuss how to interpret a regression model when some
variables in the model have been log transformed.
The example data can be downloaded here (the file is in .csv format). The
variables in the data set are writing, reading, and math scores (
Variable | Type Obs Mean [95% Conf. Interval] -------------+---------------------------------------------------------- write | Arithmetic 200 52.775 51.45332 54.09668 | Geometric 200 51.8496 50.46854 53.26845 | Harmonic 200 50.84403 49.40262 52.37208 ------------------------------------------------------------------------
Outcome variable is log transformed
Very often, a linear relationship is hypothesized between a log transformed outcome variable and a group of predictor variables. Written mathematically, the relationship follows the equation
where
Let’s start with the intercept-only model.
------------------------------------------------------------------------------ lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- intercept | 3.948347 .0136905 288.40 0.000 3.92135 3.975344 ------------------------------------------------------------------------------
We can say that
Now let’s move on to a model with a single binary predictor variable.
------------------------------------------------------------------------------ lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .1032614 .0265669 3.89 0.000 .050871 .1556518 intercept | 3.89207 .0196128 198.45 0.000 3.853393 3.930747 ------------------------------------------------------------------------------
Before diving into the interpretation of these parameters, let’s get the
means of our dependent variable,
males Variable | Type Obs Mean [95% Conf. Interval] -------------+---------------------------------------------------------- write | Arithmetic 91 50.12088 47.97473 52.26703 | Geometric 91 49.01222 46.8497 51.27457 | Harmonic 91 47.85388 45.6903 50.23255 ------------------------------------------------------------------------ females Variable | Type Obs Mean [95% Conf. Interval] -------------+---------------------------------------------------------- write | Arithmetic 109 54.99083 53.44658 56.53507 | Geometric 109 54.34383 52.73513 56.0016 | Harmonic 109 53.64236 51.96389 55.43289 ------------------------------------------------------------------------
Now we can map the parameter estimates to the geometric means for the two
groups. The intercept of
Last, let’s look at a model with multiple predictor variables.
------------------------------------------------------------------------------ lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .114718 .0195341 5.87 0.000 .076194 .153242 read | .0066305 .0012689 5.23 0.000 .0041281 .0091329 math | .0076792 .0013873 5.54 0.000 .0049432 .0104152 intercept | 3.135243 .0598109 52.42 0.000 3.017287 3.253198 ------------------------------------------------------------------------------
The exponentiated coefficient
The intercept becomes less interesting when the predictor variables are not centered and are continuous. In this particular model, the intercept is the expected
mean for
In summary, when the outcome variable is log transformed, it is natural to interpret the exponentiated regression coefficients. These values correspond to changes in the ratio of the expected geometric means of the original outcome variable.
Some (not all) predictor variables are log transformed
Occasionally, we also have some predictor variables being log transformed. In this section, we will take a look at an example where some predictor variables are log-transformed, but the outcome variable is in its original scale.
------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5.388777 .9307948 5.79 0.000 3.553118 7.224436 lgmath | 20.94097 3.430907 6.10 0.000 14.17473 27.7072 lgread | 16.85218 3.063376 5.50 0.000 10.81076 22.89359 intercept | -99.16397 10.80406 -9.18 0.000 -120.4711 -77.85685 ------------------------------------------------------------------------------
Written in equation, we have
Since this is an OLS regression, the interpretation of the regression
coefficients for the non-transformed variables are unchanged from an OLS
regression without any transformed variables. For example, the
expected mean difference in writing scores between the female and male students is about
How do we interpret the coefficient of
Note:
Recalling the Taylor expansion of the function
Both the outcome variable and some predictor variables are log transformed
What happens when both the outcome variable and predictor variables are log transformed? We can combine the two previously described situations into one. Here is an example of such a model.
------------------------------------------------------------------------------ lgwrite | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .1142399 .0194712 5.87 0.000 .07584 .1526399 lgmath | .4085369 .0720791 5.67 0.000 .2663866 .5506872 read | .0066086 .0012561 5.26 0.000 .0041313 .0090859 intercept | 1.928101 .2469391 7.81 0.000 1.441102 2.415099 ------------------------------------------------------------------------------
Written as an equation, we can describe the model:
For variables that are not transformed, such as
Now, let’s focus on the effect of
It can be simplified to
This tells us that as long as the ratio of the
two math scores,
Note:
Here also we can use an approximation method. Since,