Please note: The following example is for illustrative purposes only. The data presented is not meant to recommend or encourage the estimation of random effects on categorical variables with very few unique levels.
Consider the following nested experiment: A study was conducted measuring the thickness of the oxide layer on silicon wafers. The wafers were produced on two different machines (source
). Four lots of wafers were selected at random from each machine. From each lot three wafers were selected at random to be measured. Finally, on each wafer
three positions were selected. So, we have position
nested in wafer
, wafer
nested in lot
which is nested in source
. The primary concern of this experiment is to
determine whether the two machines (source
) differ in the thickness of their oxide layers.
Let’s load the data and look at our sample.
use https://stats.idre.ucla.edu/stat/data/thickness, clear
list in 1/10
+--------------------------------------------+ | source lot wafer position thickn~s | |--------------------------------------------| 1. | 1 1 1 1 2006 | 2. | 1 1 1 2 1999 | 3. | 1 1 1 3 2007 | 4. | 1 1 2 1 1980 | 5. | 1 1 2 2 1988 | |--------------------------------------------| 6. | 1 1 2 3 1982 | 7. | 1 1 3 1 2000 | 8. | 1 1 3 2 1998 | 9. | 1 1 3 3 2007 | 10. | 1 2 1 1 1991 | +--------------------------------------------+tabstat thickness, by(source) stat(n mean sd)
Summary for variables: thickness by categories of: source source | N mean sd ---------+------------------------------ 1 | 36 1995.111 7.531943 2 | 36 2005.194 14.86668 ---------+------------------------------ Total | 72 2000.153 12.75518 ----------------------------------------
Next, we will need to create a variable that indicates lot
nested in source
. We
will do this using the egen
command with the group
function.
egen lotinsource = group(lot source), label
tab lotinsource
group(lot | source) | Freq. Percent Cum. ------------+----------------------------------- 1 1 | 9 12.50 12.50 1 2 | 9 12.50 25.00 2 1 | 9 12.50 37.50 2 2 | 9 12.50 50.00 3 1 | 9 12.50 62.50 3 2 | 9 12.50 75.00 4 1 | 9 12.50 87.50 4 2 | 9 12.50 100.00 ------------+----------------------------------- Total | 72 100.00
From the table above it looks lot
is crossed withsource
. This is not the case since a lot
drawn from source
1 is a different from a lot
that is drawn from source
2. Fortunately,
mixed
will be able to sort this out for us. Here is one way to parameterize this model.
mixed thickness i.source || lotinsource: || wafer:, var
Performing gradient-based optimization: Iteration 0: log likelihood = -228.43197 Iteration 1: log likelihood = -228.43197 Computing standard errors: Mixed-effects ML regression Number of obs = 72 ------------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-------------------------------------------- lotinsource | 8 9 9.0 9 wafer | 24 3 3.0 3 ------------------------------------------------------------- Wald chi2(1) = 2.03 Log likelihood = -228.43197 Prob > chi2 = 0.1537 ------------------------------------------------------------------------------ thickness | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.source | 10.08333 7.068711 1.43 0.154 -3.771085 23.93775 _cons | 1995.111 4.998333 399.16 0.000 1985.315 2004.908 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lotinsource: Identity | var(_cons) | 86.58149 50.1892 27.79739 269.6783 -----------------------------+------------------------------------------------ wafer: Identity | var(_cons) | 35.86577 14.18759 16.51834 77.87428 -----------------------------+------------------------------------------------ var(Residual) | 12.56944 2.565726 8.424908 18.75282 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(2) = 100.65 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.
Note that the test for differences in source
is not significant. Also, note that the variable position
does not appear in the model. That’s because variability due to position
is accounted for by the residual variance. In the output above, lots nested
in source (lotinsource
) has a variance of 86.58, wafer
has a variance of 35.87 and position
(residual) has a variance of 12.57.
There is an alternative way to parameterize this model that is somewhat more efficient.
mixed thickness i.source || lotinsource: || _all: R.wafer, var
Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -228.43197 Iteration 1: log likelihood = -228.43197 Computing standard errors: Mixed-effects ML regression Number of obs = 72 ------------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-------------------------------------------- lotinsource | 8 9 9.0 9 _all | 8 9 9.0 9 ------------------------------------------------------------- Wald chi2(1) = 2.03 Log likelihood = -228.43197 Prob > chi2 = 0.1537 ------------------------------------------------------------------------------ thickness | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.source | 10.08333 7.068711 1.43 0.154 -3.771085 23.93775 _cons | 1995.111 4.998333 399.16 0.000 1985.315 2004.908 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lotinsource: Identity | var(_cons) | 86.58149 50.1892 27.79739 269.6783 -----------------------------+------------------------------------------------ _all: Identity | var(R.wafer) | 35.86577 14.18759 16.51834 77.87427 -----------------------------+------------------------------------------------ var(Residual) | 12.56944 2.565726 8.424908 18.75282 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(2) = 100.65 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.
All of the results as the same as in our first model, however some of the labels for the variance components differ.
This design is completely balanced so the mixed
results will be identical to those
using the anova
command.
anova thickness source / lot|source wafer|lot|source
Number of obs = 72 R-squared = 0.9478 Root MSE = 3.54534 Adj R-squared = 0.9227 Source | Partial SS df MS F Prob > F -----------------+---------------------------------------------------- Model | 10947.9861 23 475.999396 37.87 0.0000 | source | 1830.125 1 1830.125 1.53 0.2629 lot|source | 7195.19444 6 1199.19907 -----------------+---------------------------------------------------- wafer|lot|source | 1922.66667 16 120.166667 9.56 0.0000 | Residual | 603.333333 48 12.5694444 -----------------+---------------------------------------------------- Total | 11551.3194 71 162.69464
For more information on nested models see: “Multilevel and Longitudinal Modeling Using Stata” by Sophia Rabe-Hesketh and Anders Skrondal (2012)