Randomized block type designs are relatively common in certain fields. Balanced randomized designs can be analyzed using traditional anova and regression methods but unbalanced designs require the use of maximum likelihood methods.
We will begin by analyzing a balanced design with four levels of variable a and 8 subjects denoted s on response variable y using tradition anova.
use https://stats.idre.ucla.edu/stat/stata/faq/randomblock, clear describe Contains data from https://stats.idre.ucla.edu/stat/stata/faq/randomblock.dta obs: 32 vars: 3 1 May 2002 09:01 size: 224 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- s byte %8.0g a byte %8.0g y byte %8.0g ------------------------------------------------------------------------------- Sorted by: /* balanced design -- traditional anova */ anova y a s Number of obs = 32 R-squared = 0.7318 Root MSE = 1.18523 Adj R-squared = 0.6041 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 80.5 10 8.05 5.73 0.0004 | a | 49 3 16.3333333 11.63 0.0001 s | 31.5 7 4.5 3.20 0.0180 | Residual | 29.5 21 1.4047619 -----------+---------------------------------------------------- Total | 110 31 3.5483871 anova, regress Source | SS df MS Number of obs = 32 -------------+------------------------------ F( 10, 21) = 5.73 Model | 80.5 10 8.05 Prob > F = 0.0004 Residual | 29.5 21 1.4047619 R-squared = 0.7318 -------------+------------------------------ Adj R-squared = 0.6041 Total | 110 31 3.5483871 Root MSE = 1.1852 ------------------------------------------------------------------------------ y Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 8 .6949006 11.51 0.000 6.554875 9.445125 a 1 -3.25 .5926133 -5.48 0.000 -4.482407 -2.017593 2 -2.75 .5926133 -4.64 0.000 -3.982407 -1.517593 3 -2 .5926133 -3.37 0.003 -3.232407 -.7675933 4 (dropped) s 1 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137 2 -2.25 .8380817 -2.68 0.014 -3.992886 -.5071137 3 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137 4 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137 5 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137 6 -1.5 .8380817 -1.79 0.088 -3.242886 .2428863 7 -.25 .8380817 -0.30 0.768 -1.992886 1.492886 8 (dropped) ------------------------------------------------------------------------------
Next we will set two of the observations to missing.
replace y=. if a==1 & s==2 replace y=. if a==2 & s==3 generate complete = 1 replace complete = 0 if s==2 | s==3 tabulate a, gen(a) a | Freq. Percent Cum. ------------+----------------------------------- 1 | 7 23.33 23.33 2 | 7 23.33 46.67 3 | 8 26.67 73.33 4 | 8 26.67 100.00 ------------+----------------------------------- Total | 30 100.00 tabulate s if complete s | Freq. Percent Cum. ------------+----------------------------------- 1 | 4 16.67 16.67 4 | 4 16.67 33.33 5 | 4 16.67 50.00 6 | 4 16.67 66.67 7 | 4 16.67 83.33 8 | 4 16.67 100.00 ------------+----------------------------------- Total | 24 100.00
In traditional anova, particularly procedures that use with wide data, only subjects with complete data are included in the analysis. In this case, only the six subjects with complete data would be used.
/* unbalanced design -- traditional anova */ anova y a s if complete Number of obs = 24 R-squared = 0.7105 Root MSE = 1.33229 Adj R-squared = 0.5560 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 65.3333333 8 8.16666667 4.60 0.0054 | a | 38.125 3 12.7083333 7.16 0.0033 s | 27.2083333 5 5.44166667 3.07 0.0420 | Residual | 26.625 15 1.775 -----------+---------------------------------------------------- Total | 91.9583333 23 3.99818841 anova, regress Source | SS df MS Number of obs = 24 -------------+------------------------------ F( 8, 15) = 4.60 Model | 65.3333333 8 8.16666667 Prob > F = 0.0054 Residual | 26.625 15 1.775 R-squared = 0.7105 -------------+------------------------------ Adj R-squared = 0.5560 Total | 91.9583333 23 3.99818841 Root MSE = 1.3323 ------------------------------------------------------------------------------ y Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 8.041667 .8158584 9.86 0.000 6.302706 9.780628 a 1 -3.166667 .7691987 -4.12 0.001 -4.806175 -1.527158 2 -3 .7691987 -3.90 0.001 -4.639508 -1.360492 3 -2 .7691987 -2.60 0.020 -3.639508 -.3604917 4 (dropped) s 1 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207 4 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207 5 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207 6 -1.5 .9420722 -1.59 0.132 -3.507979 .5079793 7 -.25 .9420722 -0.27 0.794 -2.257979 1.757979 8 (dropped) ------------------------------------------------------------------------------
If you have Stata 9 you can use xtmixed. We will run it using both the default REML and full maximum likelihood. With maximum likelihood we can run the analysis with all observations not just on subjects with complete data.
xtmixed y a1 a2 a3 || s:, variance Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -49.98054 Iteration 1: log restricted-likelihood = -49.98054 Computing standard errors: Mixed-effects REML regression Number of obs = 30 Group variable: s Number of groups = 8 Obs per group: min = 3 avg = 3.8 max = 4 Wald chi2(3) = 29.29 Log restricted-likelihood = -49.98054 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a1 | -3.135557 .6416073 -4.89 0.000 -4.393084 -1.87803 a2 | -2.753089 .6416073 -4.29 0.000 -4.010616 -1.495562 a3 | -2 .6157501 -3.25 0.001 -3.206848 -.7931519 _cons | 6.25 .5327192 11.73 0.000 5.20589 7.29411 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ s: Identity | var(_cons) | .7537252 .6242815 .1486605 3.82147 -----------------------------+------------------------------------------------ var(Residual) | 1.516593 .4889405 .8062073 2.852931 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 3.43 Prob >= chibar2 = 0.0320 test a1 a2 a3 ( 1) [y]a1 = 0 ( 2) [y]a2 = 0 ( 3) [y]a3 = 0 chi2( 3) = 29.29 Prob > chi2 = 0.0000 /* approximate F-test */ display r(chi2)/r(df) 9.7619818 /* next with full maximum likelihood */ xtmixed y a1 a2 a3 || s:, variance mle Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -50.864974 Iteration 1: log likelihood = -50.864974 Computing standard errors: Mixed-effects ML regression Number of obs = 30 Group variable: s Number of groups = 8 Obs per group: min = 3 avg = 3.8 max = 4 Wald chi2(3) = 33.86 Log likelihood = -50.864974 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a1 | -3.13568 .5967199 -5.25 0.000 -4.305229 -1.96613 a2 | -2.753432 .5967199 -4.61 0.000 -3.922982 -1.583883 a3 | -2 .572654 -3.49 0.000 -3.122381 -.8776188 _cons | 6.25 .496394 12.59 0.000 5.277086 7.222914 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ s: Identity | var(_cons) | .6595255 .5088202 .1453899 2.991774 -----------------------------+------------------------------------------------ var(Residual) | 1.31173 .393547 .7285611 2.361691 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 3.99 Prob >= chibar2 = 0.0229 test a1 a2 a3 ( 1) [y]a1 = 0 ( 2) [y]a2 = 0 ( 3) [y]a3 = 0 chi2( 3) = 33.86 Prob > chi2 = 0.0000 /* approximate F-test */ display r(chi2)/r(df) 11.287767
If you don’t have Stata 9 (and therefore don’t have xtmixed) you can use xtreg with the mle (maximum likelihood) option.
/* unbalanced design -- maximum likelihood using xtreg */ xtreg y a1 a2 a3, i(s) mle Random-effects ML regression Number of obs = 30 Group variable (i): s Number of groups = 8 Random effects u_i ~ Gaussian Obs per group: min = 3 avg = 3.8 max = 4 LR chi2(3) = 20.35 Log likelihood = -50.864974 Prob > chi2 = 0.0001 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a1 | -3.13568 .5967916 -5.25 0.000 -4.30537 -1.96599 a2 | -2.753432 .5972855 -4.61 0.000 -3.92409 -1.582774 a3 | -2 .5726538 -3.49 0.000 -3.122381 -.8776192 _cons | 6.25 .4963939 12.59 0.000 5.277086 7.222914 -------------+---------------------------------------------------------------- /sigma_u | .8121118 .3132698 2.59 0.010 .1981143 1.426109 /sigma_e | 1.145308 .1718083 6.67 0.000 .80857 1.482046 -------------+---------------------------------------------------------------- rho | .3345713 .1958727 .0692199 .7346624 ------------------------------------------------------------------------------ Likelihood-ratio test of sigma_u=0: chibar2(01)= 3.99 Prob>=chibar2 = 0.023 test a1 a2 a3 ( 1) [y]a1 = 0 ( 2) [y]a2 = 0 ( 3) [y]a3 = 0 chi2( 3) = 33.83 Prob > chi2 = 0.0000 /* approximate F-test */ display r(chi2)/r(df) 11.277428
As you can see the maximum likelihood solutions are much closer to the balanced design results then is the analysis using only subjects complete data.