Loss of subjects in a repeated measures ANOVA due to
missing data can be a serious problem. If you use **proc glm** to perform you
analysis, it will omit observations **listwise**, meaning that if any of the
observations for a subject are missing, the entire subject will be omitted from the
analysis. Consider the data file below based on an example of from **Design and
Analysis** by G. Keppel. Pages 414-416. This example contains 8 subjects (sub) with
one between subjects IV with 2 levels (group) and 1 within subjects IV with 4 levels. We
have inserted 4 missing values to illustrate the impact of missing data in this kind of
design.

DATA wide; INPUT sub group dv1 dv2 dv3 dv4; CARDS; 1 1 3 4 7 3 2 1 6 . 12 9 3 1 7 13 11 11 4 1 0 3 . 6 5 2 5 6 11 7 6 2 10 12 18 . 7 2 10 15 15 14 8 2 5 . 11 9 ; RUN; PROC PRINT DATA=wide ; RUN;OBS SUB GROUP DV1 DV2 DV3 DV4 1 1 1 3 4 7 3 2 2 1 6 . 12 9 3 3 1 7 13 11 11 4 4 1 0 3 . 6 5 5 2 5 6 11 7 6 6 2 10 12 18 . 7 7 2 10 15 15 14 8 8 2 5 . 11 9

We start by showing how to perform a standard 2 by 4
(between / within) ANOVA using **proc glm**.

PROC GLM DATA=wide; CLASS group; MODEL dv1-dv4 = group / NOUNI ; REPEATED trial 4; RUN;

Note the number of observations available for analysis is only four, and that four have been omitted due to missing data. The results of this analysis are shown below.

General Linear Models Procedure Class Level Information Class Levels Values GROUP 2 1 2 Number of observations in data set = 8 NOTE: Observations with missing values will not be included in this analysis. Thus, only 4 observations can be used in this analysis. General Linear Models Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable DV1 DV2 DV3 DV4 Level of TRIAL 1 2 3 4 General Linear Models Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F GROUP 1 36.00000000 36.00000000 0.46 0.5673 Error 2 156.25000000 78.12500000 General Linear Models Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source: TRIAL Adj Pr > F DF Type III SS Mean Square F Value Pr > F G - G H - F 3 47.25000000 15.75000000 5.32 0.0397 0.1430 0.0629 Source: TRIAL*GROUP Adj Pr > F DF Type III SS Mean Square F Value Pr > F G - G H - F 3 2.50000000 0.83333333 0.28 0.8371 0.6556 0.7898 Source: Error(TRIAL) DF Type III SS Mean Square 6 17.75000000 2.95833333 Greenhouse-Geisser Epsilon = 0.3474 Huynh-Feldt Epsilon = 0.7547

Now, we will illustrate how you can perform this same
analysis in **proc mixed**. First, we need to reshape the data so it is in
the shape expected by **proc mixed**. **proc glm** expects the
data to be in a **wide** format, where each observation corresponds to a
subject. By contrast, **proc mixed** expects the data to be in a **long**
format where each observation corresponds to a **trial**. In this case,
**proc mixed** expects that there would be four observations per subject and that each
observation would correspond to the measurements on the four different trials. Below we show
how you can reshape the data for analysis in **proc mixed**.

DATA long ; SET Wide; dv = dv1; trial = 1; OUTPUT; dv = dv2; trial = 2; OUTPUT; dv = dv3; trial = 3; OUTPUT; dv = dv4; trial = 4; OUTPUT; DROP dv1 - dv4 ; RUN; PROC PRINT DATA=long ; RUN;

You can compare the **proc print** for **wide**
with the **proc print** for **long** to verify that
the data were properly reshaped.

OBS SUB GROUP DV TRIAL 1 1 1 3 1 2 1 1 4 2 3 1 1 7 3 4 1 1 3 4 5 2 1 6 1 6 2 1 . 2 7 2 1 12 3 8 2 1 9 4 9 3 1 7 1 10 3 1 13 2 11 3 1 11 3 12 3 1 11 4 13 4 1 0 1 14 4 1 3 2 15 4 1 . 3 16 4 1 6 4 17 5 2 5 1 18 5 2 6 2 19 5 2 11 3 20 5 2 7 4 21 6 2 10 1 22 6 2 12 2 23 6 2 18 3 24 6 2 . 4 25 7 2 10 1 26 7 2 15 2 27 7 2 15 3 28 7 2 14 4 29 8 2 5 1 30 8 2 . 2 31 8 2 11 3 32 8 2 9 4

Now that the data are in the proper shape, we can analyze it with **proc
mixed**. **Proc mixed** does not delete missing data **listwise**.
It analyzes all of the data that are present. For the analysis to be valid, it
is assumed that the data are missing at random. Rarely, however, are data truly missing at random. To the
extent that there are systematic factors that led to the data being missing, the analysis
will not be valid. In using this kind of analysis, we recommend that you assess and
present information regarding the reasons for missing data and an assessment of the extent
to which it was non-random.

PROC MIXED DATA=long; CLASS sub group trial; MODEL dv = group trial group*trial; REPEATED trial / SUBJECT=sub TYPE=CS; run;

As you see below, **proc mixed** analyzed
all eight of the subjects and had far less missing data than the analysis with
**proc glm**.

The MIXED Procedure Class Level Information Class Levels Values SUB 8 1 2 3 4 5 6 7 8 GROUP 2 1 2 TRIAL 4 1 2 3 4 REML Estimation Iteration History Iteration Evaluations Objective Criterion 0 1 81.93159646 1 3 63.43970119 0.00138808 2 1 63.39025490 0.00006552 3 1 63.38810898 0.00000018 4 1 63.38810333 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm Subject Estimate CS SUB 10.83244625 Residual 2.29522110 Model Fitting Information for DV Description Value Observations 28.0000 Res Log Likelihood -50.0728 Akaike's Information Criterion -52.0728 Schwarz's Bayesian Criterion -53.0686 -2 Res Log Likelihood 100.1456 Null Model LRT Chi-Square 18.5435 Null Model LRT DF 1.0000 Null Model LRT P-Value 0.0000 Tests of Fixed Effects Source NDF DDF Type III F Pr > F GROUP 1 6 2.37 0.1748 TRIAL 3 14 17.04 0.0001 GROUP*TRIAL 3 14 0.40 0.7556

**Proc mixed** is much more powerful than **proc glm**. Because it is more powerful, it is more complex to use. This FAQ just
scratches the surface in the use of **proc mixed**.