How is the 95% CI of the variance component in a mixed model calculated?

Below is a mixed model, where female is used to predict mathach, the model includes a random intercept, where the level 2 units are defined by the variable id.

use https://stats.idre.ucla.edu/stat/stata/faq/hsb, clear
mixed mathach female || id:

Performing EM optimization ...

Performing gradient-based optimization: 
Iteration 0:  Log likelihood =  -23526.66  
Iteration 1:  Log likelihood =  -23526.66  

Computing standard errors ...

Mixed-effects ML regression                          Number of obs    =  7,185
Group variable: id                                   Number of groups =    160
                                                     Obs per group:
                                                                  min =     14
                                                                  avg =   44.9
                                                                  max =     67
                                                     Wald chi2(1)     =  62.89
Log likelihood =  -23526.66                          Prob > chi2      = 0.0000

------------------------------------------------------------------------------
     mathach | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |   -1.35939   .1714111    -7.93   0.000    -1.695349    -1.02343
       _cons |   13.34526   .2539356    52.55   0.000     12.84756    13.84297
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   8.109025   1.018281      6.339865    10.37187
-----------------------------+------------------------------------------------
               var(Residual) |   38.84481   .6555315      37.58101    40.15111
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 936.66        Prob >= chibar2 = 0.0000

At the bottom of the output is the table that displays the estimates of the variances are shown. The variance of the random intercept is displayed on the line beginning var(_cons) and is estimated as 8.11 for this model. In addition to the estimate of the random intercept, the table includes the standard error of the estimate and the 95% confidence interval (CI). You may notice that the lower bound for the CI is not equal to 8.109025 – 1.96*1.018281 (= 6.11), and the upper bound of the CI is also seemingly inconsistent with the output. Why isn’t the CI calculated in the usual way (i.e., b +/- 1.96*se)? The CI actually is calculated in the usual way, it’s just that the displayed values just aren’t the correct values to use to in the calculation.

To understand why the values displayed are not used calculate the 95% CI it is important to know that Stata doesn’t actually estimate the variance (or the standard deviation) of the random effects; instead, it estimates the natural log of the variance (i.e., ln(var)), which assures that the standard deviation will always be positive. Stata then exponentiates the estimates so that what you see is the variance (or the standard deviation if the sd option is used). Knowing this, we can see that the correct formula for the confidence interval involves the natural logs of the coefficients and standard errors displayed, specifically:

CI = exp(ln(var) +/- 1.96*(ln(sevar))

where sevar is the standard error of the estimate of the variance of the random effect. Calculating the CI this was assures that the lower bound of the CI will never be below zero. It also results in CIs that are not symmetric around the estimate of the variance or the standard deviation.

We can use the returned results that Stata stores after the model is run to calculate the CI. The ln(var) of the random intercept is stored in the rather odd-looking macro _b[lns1_1_1:_cons] , and the standard error is stored in _se[lns1_1_1:_cons]. We can use those values, along with the display command, to calculate the lower and upper bounds of the CI. The two lines of code below do just that:

display exp(_b[lns1_1_1:_cons] - 1.96*_se[lns1_1_1:_cons])
2.5263929
display exp(_b[lns1_1_1:_cons] + 1.96*_se[lns1_1_1:_cons])
3.233295

These values should be very close to the bounds of the CI shown in the output, there may be some (very) small differences because Stata uses a more precise approximation to the correct z-value for a 95% CI (1.959964…) than we used (1.96). Note that in order to get the confidence interval for the standard deviation you will need to take the square root of the upper and lower bounds of the CI, the same way that you take the square root the variance to get the standard deviation.