Randomized block type designs are relatively common in certain fields. Balanced randomized designs can be analyzed using traditional anova and regression methods but unbalanced designs require the use of maximum likelihood methods.
We will begin by analyzing a balanced design with four levels of variable a and 8 subjects denoted s on response variable y using tradition anova.
use https://stats.idre.ucla.edu/stat/stata/faq/randomblock, clear
describe
Contains data from https://stats.idre.ucla.edu/stat/stata/faq/randomblock.dta
obs: 32
vars: 3 1 May 2002 09:01
size: 224 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
s byte %8.0g
a byte %8.0g
y byte %8.0g
-------------------------------------------------------------------------------
Sorted by:
/* balanced design -- traditional anova */
anova y a s
Number of obs = 32 R-squared = 0.7318
Root MSE = 1.18523 Adj R-squared = 0.6041
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 80.5 10 8.05 5.73 0.0004
|
a | 49 3 16.3333333 11.63 0.0001
s | 31.5 7 4.5 3.20 0.0180
|
Residual | 29.5 21 1.4047619
-----------+----------------------------------------------------
Total | 110 31 3.5483871
anova, regress
Source | SS df MS Number of obs = 32
-------------+------------------------------ F( 10, 21) = 5.73
Model | 80.5 10 8.05 Prob > F = 0.0004
Residual | 29.5 21 1.4047619 R-squared = 0.7318
-------------+------------------------------ Adj R-squared = 0.6041
Total | 110 31 3.5483871 Root MSE = 1.1852
------------------------------------------------------------------------------
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 8 .6949006 11.51 0.000 6.554875 9.445125
a
1 -3.25 .5926133 -5.48 0.000 -4.482407 -2.017593
2 -2.75 .5926133 -4.64 0.000 -3.982407 -1.517593
3 -2 .5926133 -3.37 0.003 -3.232407 -.7675933
4 (dropped)
s
1 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137
2 -2.25 .8380817 -2.68 0.014 -3.992886 -.5071137
3 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137
4 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137
5 -2.5 .8380817 -2.98 0.007 -4.242886 -.7571137
6 -1.5 .8380817 -1.79 0.088 -3.242886 .2428863
7 -.25 .8380817 -0.30 0.768 -1.992886 1.492886
8 (dropped)
------------------------------------------------------------------------------
Next we will set two of the observations to missing.
replace y=. if a==1 & s==2
replace y=. if a==2 & s==3
generate complete = 1
replace complete = 0 if s==2 | s==3
tabulate a, gen(a)
a | Freq. Percent Cum.
------------+-----------------------------------
1 | 7 23.33 23.33
2 | 7 23.33 46.67
3 | 8 26.67 73.33
4 | 8 26.67 100.00
------------+-----------------------------------
Total | 30 100.00
tabulate s if complete
s | Freq. Percent Cum.
------------+-----------------------------------
1 | 4 16.67 16.67
4 | 4 16.67 33.33
5 | 4 16.67 50.00
6 | 4 16.67 66.67
7 | 4 16.67 83.33
8 | 4 16.67 100.00
------------+-----------------------------------
Total | 24 100.00
In traditional anova, particularly procedures that use with wide data, only subjects with complete data are included in the analysis. In this case, only the six subjects with complete data would be used.
/* unbalanced design -- traditional anova */
anova y a s if complete
Number of obs = 24 R-squared = 0.7105
Root MSE = 1.33229 Adj R-squared = 0.5560
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 65.3333333 8 8.16666667 4.60 0.0054
|
a | 38.125 3 12.7083333 7.16 0.0033
s | 27.2083333 5 5.44166667 3.07 0.0420
|
Residual | 26.625 15 1.775
-----------+----------------------------------------------------
Total | 91.9583333 23 3.99818841
anova, regress
Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 8, 15) = 4.60
Model | 65.3333333 8 8.16666667 Prob > F = 0.0054
Residual | 26.625 15 1.775 R-squared = 0.7105
-------------+------------------------------ Adj R-squared = 0.5560
Total | 91.9583333 23 3.99818841 Root MSE = 1.3323
------------------------------------------------------------------------------
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 8.041667 .8158584 9.86 0.000 6.302706 9.780628
a
1 -3.166667 .7691987 -4.12 0.001 -4.806175 -1.527158
2 -3 .7691987 -3.90 0.001 -4.639508 -1.360492
3 -2 .7691987 -2.60 0.020 -3.639508 -.3604917
4 (dropped)
s
1 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207
4 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207
5 -2.5 .9420722 -2.65 0.018 -4.507979 -.4920207
6 -1.5 .9420722 -1.59 0.132 -3.507979 .5079793
7 -.25 .9420722 -0.27 0.794 -2.257979 1.757979
8 (dropped)
------------------------------------------------------------------------------
If you have Stata 9 you can use xtmixed. We will run it using both the default REML and full maximum likelihood. With maximum likelihood we can run the analysis with all observations not just on subjects with complete data.
xtmixed y a1 a2 a3 || s:, variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -49.98054
Iteration 1: log restricted-likelihood = -49.98054
Computing standard errors:
Mixed-effects REML regression Number of obs = 30
Group variable: s Number of groups = 8
Obs per group: min = 3
avg = 3.8
max = 4
Wald chi2(3) = 29.29
Log restricted-likelihood = -49.98054 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a1 | -3.135557 .6416073 -4.89 0.000 -4.393084 -1.87803
a2 | -2.753089 .6416073 -4.29 0.000 -4.010616 -1.495562
a3 | -2 .6157501 -3.25 0.001 -3.206848 -.7931519
_cons | 6.25 .5327192 11.73 0.000 5.20589 7.29411
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
s: Identity |
var(_cons) | .7537252 .6242815 .1486605 3.82147
-----------------------------+------------------------------------------------
var(Residual) | 1.516593 .4889405 .8062073 2.852931
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 3.43 Prob >= chibar2 = 0.0320
test a1 a2 a3
( 1) [y]a1 = 0
( 2) [y]a2 = 0
( 3) [y]a3 = 0
chi2( 3) = 29.29
Prob > chi2 = 0.0000
/* approximate F-test */
display r(chi2)/r(df)
9.7619818
/* next with full maximum likelihood */
xtmixed y a1 a2 a3 || s:, variance mle
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -50.864974
Iteration 1: log likelihood = -50.864974
Computing standard errors:
Mixed-effects ML regression Number of obs = 30
Group variable: s Number of groups = 8
Obs per group: min = 3
avg = 3.8
max = 4
Wald chi2(3) = 33.86
Log likelihood = -50.864974 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a1 | -3.13568 .5967199 -5.25 0.000 -4.305229 -1.96613
a2 | -2.753432 .5967199 -4.61 0.000 -3.922982 -1.583883
a3 | -2 .572654 -3.49 0.000 -3.122381 -.8776188
_cons | 6.25 .496394 12.59 0.000 5.277086 7.222914
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
s: Identity |
var(_cons) | .6595255 .5088202 .1453899 2.991774
-----------------------------+------------------------------------------------
var(Residual) | 1.31173 .393547 .7285611 2.361691
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 3.99 Prob >= chibar2 = 0.0229
test a1 a2 a3
( 1) [y]a1 = 0
( 2) [y]a2 = 0
( 3) [y]a3 = 0
chi2( 3) = 33.86
Prob > chi2 = 0.0000
/* approximate F-test */
display r(chi2)/r(df)
11.287767
If you don’t have Stata 9 (and therefore don’t have xtmixed) you can use xtreg with the mle (maximum likelihood) option.
/* unbalanced design -- maximum likelihood using xtreg */
xtreg y a1 a2 a3, i(s) mle
Random-effects ML regression Number of obs = 30
Group variable (i): s Number of groups = 8
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.8
max = 4
LR chi2(3) = 20.35
Log likelihood = -50.864974 Prob > chi2 = 0.0001
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a1 | -3.13568 .5967916 -5.25 0.000 -4.30537 -1.96599
a2 | -2.753432 .5972855 -4.61 0.000 -3.92409 -1.582774
a3 | -2 .5726538 -3.49 0.000 -3.122381 -.8776192
_cons | 6.25 .4963939 12.59 0.000 5.277086 7.222914
-------------+----------------------------------------------------------------
/sigma_u | .8121118 .3132698 2.59 0.010 .1981143 1.426109
/sigma_e | 1.145308 .1718083 6.67 0.000 .80857 1.482046
-------------+----------------------------------------------------------------
rho | .3345713 .1958727 .0692199 .7346624
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 3.99 Prob>=chibar2 = 0.023
test a1 a2 a3
( 1) [y]a1 = 0
( 2) [y]a2 = 0
( 3) [y]a3 = 0
chi2( 3) = 33.83
Prob > chi2 = 0.0000
/* approximate F-test */
display r(chi2)/r(df)
11.277428
As you can see the maximum likelihood solutions are much closer to the balanced design results then is the analysis using only subjects complete data.
